Personalized Food RecommendationsExploring Content-Based Methods
Jorge Miguel Tavares Soares de Almeida
Thesis to obtain the Master of Science Degree in
Telecommunications and Informatics Engineering
Supervisors Prof Paacutevel Pereira CaladoProf Bruno Emanuel da Graccedila Martins
Examination Committee
Chairperson Prof Paulo Jorge Pires FerreiraSupervisor Prof Paacutevel Pereira Calado
Member of the Committee Prof Joatildeo Miguel da Costa Magalhatildees
November 2015
ii
Acknowledgments
I would like to acknowledge a few people for their help and availability during the course of my M
Sc Dissertation
First I would like to thank my thesis dissertation supervisors Prof Pavel Calado and Prof Bruno
Martins for their guidance knowledge and constructive criticism which greatly improved the quality
of this work
I would also like to thank Prof Miguel Mira da Silva for the opportunity to be a part of the
YoLP project and for providing me with a research scholarship at INOV that supported my study of
recommendation systems in the food domain
Lastly I would like to thank my parents for their continued support throughout the years allowing
me to focus on my academic studies and on completing my Masterrsquos Degree
iii
iv
Resumo
Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes
personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente
nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias
de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na
classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar
Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar
foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a
este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros
testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do
algoritmo
Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-
mida Receita Aprendizagem Autonoma
v
vi
Abstract
Food recommendation is a relatively new area with few systems that focus on analysing user pref-
erences being deployed in real settings In my MSc dissertation the applicability of content-based
methods in personalized food recommendation is explored Variations of popular approaches used
in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide
personalized food recommendations With the objective of exploring content-based methods in this
area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to
this domain Besides the validation of the algorithm explored in this work other interesting tests
were also performed amongst them recipe feature testing the impact of the standard deviation in
the recommendation error and the algorithmrsquos learning curve
Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-
tion Recipe Machine Learning Feature Testing
vii
viii
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
ii
Acknowledgments
I would like to acknowledge a few people for their help and availability during the course of my M
Sc Dissertation
First I would like to thank my thesis dissertation supervisors Prof Pavel Calado and Prof Bruno
Martins for their guidance knowledge and constructive criticism which greatly improved the quality
of this work
I would also like to thank Prof Miguel Mira da Silva for the opportunity to be a part of the
YoLP project and for providing me with a research scholarship at INOV that supported my study of
recommendation systems in the food domain
Lastly I would like to thank my parents for their continued support throughout the years allowing
me to focus on my academic studies and on completing my Masterrsquos Degree
iii
iv
Resumo
Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes
personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente
nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias
de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na
classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar
Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar
foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a
este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros
testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do
algoritmo
Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-
mida Receita Aprendizagem Autonoma
v
vi
Abstract
Food recommendation is a relatively new area with few systems that focus on analysing user pref-
erences being deployed in real settings In my MSc dissertation the applicability of content-based
methods in personalized food recommendation is explored Variations of popular approaches used
in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide
personalized food recommendations With the objective of exploring content-based methods in this
area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to
this domain Besides the validation of the algorithm explored in this work other interesting tests
were also performed amongst them recipe feature testing the impact of the standard deviation in
the recommendation error and the algorithmrsquos learning curve
Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-
tion Recipe Machine Learning Feature Testing
vii
viii
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Acknowledgments
I would like to acknowledge a few people for their help and availability during the course of my M
Sc Dissertation
First I would like to thank my thesis dissertation supervisors Prof Pavel Calado and Prof Bruno
Martins for their guidance knowledge and constructive criticism which greatly improved the quality
of this work
I would also like to thank Prof Miguel Mira da Silva for the opportunity to be a part of the
YoLP project and for providing me with a research scholarship at INOV that supported my study of
recommendation systems in the food domain
Lastly I would like to thank my parents for their continued support throughout the years allowing
me to focus on my academic studies and on completing my Masterrsquos Degree
iii
iv
Resumo
Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes
personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente
nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias
de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na
classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar
Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar
foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a
este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros
testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do
algoritmo
Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-
mida Receita Aprendizagem Autonoma
v
vi
Abstract
Food recommendation is a relatively new area with few systems that focus on analysing user pref-
erences being deployed in real settings In my MSc dissertation the applicability of content-based
methods in personalized food recommendation is explored Variations of popular approaches used
in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide
personalized food recommendations With the objective of exploring content-based methods in this
area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to
this domain Besides the validation of the algorithm explored in this work other interesting tests
were also performed amongst them recipe feature testing the impact of the standard deviation in
the recommendation error and the algorithmrsquos learning curve
Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-
tion Recipe Machine Learning Feature Testing
vii
viii
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
iv
Resumo
Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes
personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente
nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias
de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na
classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar
Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar
foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a
este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros
testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do
algoritmo
Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-
mida Receita Aprendizagem Autonoma
v
vi
Abstract
Food recommendation is a relatively new area with few systems that focus on analysing user pref-
erences being deployed in real settings In my MSc dissertation the applicability of content-based
methods in personalized food recommendation is explored Variations of popular approaches used
in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide
personalized food recommendations With the objective of exploring content-based methods in this
area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to
this domain Besides the validation of the algorithm explored in this work other interesting tests
were also performed amongst them recipe feature testing the impact of the standard deviation in
the recommendation error and the algorithmrsquos learning curve
Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-
tion Recipe Machine Learning Feature Testing
vii
viii
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Resumo
Esta dissertacao explora a aplicabilidade de metodos baseados em conteudo para recomendacoes
personalizadas no domınio alimentar A recomendacao neste domınio e uma area relativamente
nova existindo poucos sistemas implementados em ambiente real que se baseiam nas preferencias
de utilizadores Metodos utilizados frequentemente noutras areas como o algoritmo de Rocchio na
classificacao de documentos podem ser adaptados para recomendacoes no domınio alimentar
Com o objectivo de explorar metodos baseados em conteudo na area de recomendacao alimentar
foi desenvolvida uma plataforma para avaliar a aplicabilidade do algoritmo de Rocchio aplicado a
este domınio Para alem da validacao do algoritmo explorado neste estudo foram efectuados outros
testes como o impacto do desvio padrao no erro de recomendacao e a curva de aprendizagem do
algoritmo
Palavras-chave Sistemas de Recomendacao Recomendacao Baseada em Conteudo Co-
mida Receita Aprendizagem Autonoma
v
vi
Abstract
Food recommendation is a relatively new area with few systems that focus on analysing user pref-
erences being deployed in real settings In my MSc dissertation the applicability of content-based
methods in personalized food recommendation is explored Variations of popular approaches used
in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide
personalized food recommendations With the objective of exploring content-based methods in this
area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to
this domain Besides the validation of the algorithm explored in this work other interesting tests
were also performed amongst them recipe feature testing the impact of the standard deviation in
the recommendation error and the algorithmrsquos learning curve
Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-
tion Recipe Machine Learning Feature Testing
vii
viii
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
vi
Abstract
Food recommendation is a relatively new area with few systems that focus on analysing user pref-
erences being deployed in real settings In my MSc dissertation the applicability of content-based
methods in personalized food recommendation is explored Variations of popular approaches used
in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide
personalized food recommendations With the objective of exploring content-based methods in this
area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to
this domain Besides the validation of the algorithm explored in this work other interesting tests
were also performed amongst them recipe feature testing the impact of the standard deviation in
the recommendation error and the algorithmrsquos learning curve
Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-
tion Recipe Machine Learning Feature Testing
vii
viii
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Abstract
Food recommendation is a relatively new area with few systems that focus on analysing user pref-
erences being deployed in real settings In my MSc dissertation the applicability of content-based
methods in personalized food recommendation is explored Variations of popular approaches used
in other areas such as Rocchiorsquos algorithm for document classification can be adapted to provide
personalized food recommendations With the objective of exploring content-based methods in this
area a system platform was developed to evaluate a variation of the Rocchio algorithm adapted to
this domain Besides the validation of the algorithm explored in this work other interesting tests
were also performed amongst them recipe feature testing the impact of the standard deviation in
the recommendation error and the algorithmrsquos learning curve
Keywords Recommendation Systems Content-Based Recommendation Food Recomenda-
tion Recipe Machine Learning Feature Testing
vii
viii
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
viii
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Contents
Acknowledgments iii
Resumo v
Abstract vii
List of Tables xi
List of Figures xiii
Acronyms xv
1 Introduction 1
11 Dissertation Structure 2
2 Fundamental Concepts 3
21 Recommendation Systems 3
211 Content-Based Methods 4
212 Collaborative Methods 9
213 Hybrid Methods 12
22 Evaluation Methods in Recommendation Systems 14
3 Related Work 17
31 Food Preference Extraction for Personalized Cooking Recipe Recommendation 17
32 Content-Boosted Collaborative Recommendation 19
33 Recommending Food Reasoning on Recipes and Ingredients 21
34 User Modeling for Adaptive News Access 22
4 Architecture 25
41 YoLP Collaborative Recommendation Component 25
42 YoLP Content-Based Recommendation Component 27
43 Experimental Recommendation Component 28
431 Rocchiorsquos Algorithm using FF-IRF 28
432 Building the Usersrsquo Prototype Vector 29
433 Generating a rating value from a similarity value 29
ix
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
44 Database and Datasets 31
5 Validation 35
51 Evaluation Metrics and Cross Validation 35
52 Baselines and First Results 36
53 Feature Testing 38
54 Similarity Threshold Variation 39
55 Standard Deviation Impact in Recommendation Error 42
56 Rocchiorsquos Learning Curve 43
6 Conclusions 47
61 Future Work 48
Bibliography 49
x
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
List of Tables
21 Ratings database for collaborative recommendation 10
41 Statistical characterization for the datasets used in the experiments 31
51 Baselines 37
52 Test Results 37
53 Testing features 38
xi
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
xii
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
List of Figures
21 Popularity of different recommendation paradigms over publications in the areas of
Computer Science (CS) and Information Systems (IS) [4] 4
22 Comparing user ratings [2] 11
23 Monolithic hybridization design [2] 13
24 Parallelized hybridization design [2] 13
25 Pipelined hybridization designs [2] 13
26 Popular evaluation measures in studies about recommendation systems from the
area of Computer Science (CS) or the area of Information Systems (IS) [4] 14
27 Evaluating recommended items [2] 15
31 Recipe - ingredient breakdown and reconstruction 21
32 Normalized MAE score for recipe recommendation [22] 22
41 System Architecture 26
42 Item-to-item collaborative recommendation1 26
43 Distribution of Epicurious rating events per rating values 32
44 Distribution of Foodcom rating events per rating values 32
45 Epicurious distribution of the number of ratings per number of users 33
51 10 Fold Cross-Validation example 36
52 Lower similarity threshold variation test using Epicurious dataset 39
53 Lower similarity threshold variation test using Foodcom dataset 40
54 Upper similarity threshold variation test using Epicurious dataset 40
55 Upper similarity threshold variation test using Foodcom dataset 41
56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset 42
57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset 43
58 Learning Curve using the Epicurious dataset up to 40 rated recipes 44
59 Learning Curve using the Foodcom dataset up to 100 rated recipes 44
510 Learning Curve using the Foodcom dataset up to 500 rated recipes 45
xiii
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
xiv
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Acronyms
YoLP - Your Lunch Pal
IF - Information Filtering
IR - Information Retrieval
VSM - Vector Space Model
TF - Term Frequency
IDF - Inverse Document Frequency
IRF - Inverse Recipe Frequency
MAE - Mean Absolute Error
RMSE - Root Mean Absolute Error
CBCF - Content-Boosted Collaborative Filtering
xv
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
xvi
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Chapter 1
Introduction
Information filtering systems [1] seek to expose users to the information items that are relevant to
them Typically some type of user model is employed to filter the data Based on developments in
Information Filtering (IF) the more modern recommendation systems [2] share the same purpose
but instead of presenting all the relevant information to the user only the items that better fit the
userrsquos preferences are chosen The process of filtering high amounts of data in a (semi)automated
way according to user preferences can provide users with a vastly richer experience
Recommendation systems are already very popular in e-commerce websites and on online ser-
vices related to movies music books social bookmarking and product sales in general However
new ones are appearing every day All these areas have one thing in common users want to explore
the space of options find interesting items or even discover new things
Still food recommendation is a relatively new area with few systems deployed in real settings
that focus on user preferences The study of current methods for supporting the development of
recommendation systems and how they can apply to food recommendation is a topic of great
interest
In this work the applicability of content-based methods in personalized food recommendation is
explored To do so a recommendation system and an evaluation benchmark were developed The
study of new variations of content-based methods adapted to food recommendation is validated
with the use of performance metrics that capture the accuracy level of the predicted ratings In
order to validate the results the experimental component is directly compared with a set of baseline
methods amongst them the YoLP content-based and collaborative components
The experiments performed in this work seek new variations of content-based methods using the
well-known Rocchio algorithm The idea of considering ingredients in a recipe as similar to words
in a document lead to the variation of TF-IDF developed in [3] This work presented good results in
retrieving the userrsquos favorite ingredients which raised the following question could these results be
further improved
1
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Besides the validation of the content-based algorithm explored in this work other tests were
also performed The algorithmrsquos learning curve and the impact of the standard deviation in the
recommendation error were also analysed Furthermore a feature test was performed to discover
the feature combination that better characterizes the recipes providing the best recommendations
The study of this problem was supported by a scholarship at INOV in a project related to the
development of a recommendation system in the food domain The project is entitled Your Lunch
Pal1 (YoLP) and it proposes to create a mobile application that allows the customer of a restaurant
to explore the available items in the restaurantrsquos menu as well as to receive based on his consumer
behaviour recommendations specifically adjusted to his personal taste The mobile application also
allows clients to order and pay for the items electronically To this end the recommendation system
in YoLP needs to understand the preferences of users through the analysis of food consumption data
and context to be able to provide accurate recommendations to a customer of a certain restaurant
11 Dissertation Structure
The rest of this dissertation is organized as follows Chapter 2 provides an overview on recommen-
dation systems introducing various fundamental concepts and describing some of the most popular
recommendation and evaluation methods In Chapter 3 four previously proposed recommendation
approaches are analysed where interesting features in the context of personalized food recommen-
dation are highlighted In Chapter 4 the modules that compose the architecture of the developed
system are described The recommendation methods are explained in detail and the datasets are
introduced and analysed Chapter 5 contains the details and results of the experiments performed
in this work and describes the evaluation metrics used to validate the algorithms implemented in
the recommendation components Lastly in Chapter 6 an overview of the main aspects of this work
is given and a few topics for future work are discussed
1httpwwwyolpeuen-us
2
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Chapter 2
Fundamental Concepts
In this chapter various fundamental concepts on recommendation systems are presented in order
to better understand the proposed objectives and the following Chapter of related work These
concepts include some of the most popular recommendation and evaluation methods
21 Recommendation Systems
Based on how recommendations are made recommendation systems are usually classified into the
following categories [2]
bull Knowledge-based recommendation systems
bull Content-based recommendation systems
bull Collaborative recommendation systems
bull Hybrid recommendation systems
In Figure 21 it is possible to see that collaborative filtering is currently the most popular approach
for developing recommendation systems Collaborative methods focus more on rating-based rec-
ommendations Content-based approaches instead relate more to classical Information Retrieval
based methods and focus on keywords as content descriptors to generate recommendations Be-
cause of this content-based methods are very popular when recommending documents news arti-
cles or web pages for example
Knowledge-based systems suggest products based on inferences about userrsquos needs and pref-
erences Two basic types of knowledge-based systems exist [2] Constraint-based and case-based
Both approaches are similar in their recommendation process the user specifies the requirements
and the system tries to identify the solution However constraint-based systems recommend items
using an explicitly defined set of recommendation rules while case-based systems use similarity
3
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 21 Popularity of different recommendation paradigms over publications in the areas of Com-puter Science (CS) and Information Systems (IS) [4]
metrics to retrieve items similar to the userrsquos requirements Knowledge-based methods are often
used in hybrid recommendation systems since they help to overcome certain limitations for collabo-
rative and content-based systems such as the well-known cold-start problem that is explained later
in this section
In the rest of this section some of the most popular approaches for content-based and collabo-
rative methods are described followed with a brief overview on hybrid recommendation systems
211 Content-Based Methods
Content-based recommendation methods basically consist in matching up the attributes of an ob-
ject with a user profile finally recommending the objects with the highest match The user profile
can be created implicitly using the information gathered over time from user interactions with the
system or explicitly where the profiling information comes directly from the user Content-based
recommendation systems can analyze two different types of data [5]
bull Structured Data items are described by the same set of attributes used in the user profiles
and the values that these attributes may take are known
bull Unstructured Data attributes do not have a well-known set of values Content analyzers are
usually employed to structure the information
Content-based systems are designed mostly for unstructured data in the form of free-text As
mentioned previously content needs to be analysed and the information in it needs to be trans-
lated into quantitative values so that a recommendation can be made With the Vector Space
4
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Model (VSM) documents can be represented as vectors of weights associated with specific terms
or keywords Each keyword or term is considered to be an attribute and their weights relate to the
relevance associated between them and the document This simple method is an example of how
unstructured data can be approached and converted into a structured representation
There are various term weighting schemes but the Term Frequency-Inverse Document Fre-
quency measure TF-IDF is perhaps the most commonly used amongst them [6] As the name
implies TF-IDF is composed by two terms The first Term Frequency (TF) is defined as follows
TFij =fij
maxzfzj(21)
where for a document j and a keyword i fij corresponds to the number of times that i appears in j
This value is divided by the maximum fzj which corresponds to the maximum frequency observed
from all keywords z in the document j
Keywords that are present in various documents do not help in distinguishing different relevance
levels so the Inverse Document Frequency measure (IDF) is also used With this measure rare
keywords are more relevant than frequent keywords IDF is defined as follows
IDFi = log
(N
ni
)(22)
In the formula N is the total number of documents and ni represents the number of documents in
which the keyword i occurs Combining the TF and IDF measures we can define the TF-IDF weight
of a keyword i in a document j as
wij = TFij times IDFi (23)
It is important to notice that TF-IDF does not identify the context where the words are used For
example when an article contains a phrase with a negation as in this article does not talk about
recommendation systems the negative context is not recognized by TF-IDF The same applies to
the quality of the document Two documents using the same terms will have the same weights
attributed to their content even if one of them is superiorly written Only the keyword frequencies in
the document and their occurrence in other documents are taken into consideration when giving a
weight to a term
Normalizing the resulting vectors of weights as obtained from Eq(23) prevents longer docu-
ments being preferred over shorter ones [5] To normalize these weights a cosine normalization is
usually employed
5
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
wij =TF -IDFijradicsumK
z=1(TF -IDFzj)2(24)
With keyword weights normalized to values in the [01] interval a similarity measure can be
applied when searching for similar items These can be documents a user profile or even a set
of keywords as long as they are represented as vectors containing weights for the same set of
keywords The cosine similarity metric as presented in Eq(25) is commonly used
Similarity(a b) =sum
k wkawkbradicsumk w
2ka
radicsumk w
2kb
(25)
Rocchiorsquos Algorithm
One popular extension of the vector space model for information retrieval relates to the usage of
relevance feedback Rocchiorsquos algorithm is a widely used relevance feedback method that operates
in the vector space model [7] It allows users to rate documents returned by a retrieval system ac-
cording to their information needs later averaging this information to improve the retrieval Rocchiorsquos
method can also be used as a classifier for content-based filtering Documents are represented as
vectors where each component corresponds to a term usually a word The weight attributed to
each word can be computed using the TF-IDF scheme Using relevance feedback document vec-
tors of positive and negative examples are combined into a prototype vector for each class c These
prototype vectors represent the learning process in this algorithm New documents are then clas-
sified according to the similarity between the prototype vector of each class and the corresponding
document vector using for example the well-known cosine similarity metric (Eq25) The document
is then assigned to the class whose document vector has the highest similarity value
More specifically Rocchiorsquos method computes a prototype vector minusrarrci = (w1i w|T |i) for each
class ci being T the vocabulary composed by the set of distinct terms in the training set The weight
for each term is given by the following formula
wki = βsum
djisinPOSi
wkj
|POSi|minus γ
sumdjisinNEGi
wkj
|NEGi|(26)
In the formula POSi and NEGi represent the positive and negative examples in the training set for
class cj and wkj is the TF-IDF weight for term k in document dj Parameters β and γ control the
6
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
influence of the positive and negative examples The document dj is assigned to the class ci with
the highest similarity value between the prototype vector minusrarrci and the document vectorminusrarrdj
Although this method has an intuitive justification it does not have any theoretic underpinnings
and there are no performance or convergence guarantees [7] In the general area of machine learn-
ing a family of online algorithms known has passive-agressive classifiers of which the perceptron
is a well-known example shares many similarities with Rocchiorsquos method and has been studied ex-
tensively [8]
Classifiers
Aside from the keyword-based techniques presented above Bayesian classifiers and various ma-
chine learning methods are other examples of techniques also used to perform content-based rec-
ommendation These approaches use probabilities gathered from previously observed data in order
to classify an object The Naive Bayes Classifier is recognized as an exceptionally well-performing
text classification algorithm [7] This classifier estimates the probability P(c|d) of a document d be-
longing to a class c using a set of probabilities previously calculated using the observed data or
training data as it is commonly called These probabilities are
bull P (c) probability of observing a document in class c
bull P (d|c) probability of observing the document d given a class c
bull P (d) probability of observing the document d
Using these probabilities the probability P(c|d) of having a class c given a document d can be
estimated by applying the Bayes theorem
P(c|d) = P(c)P(d|c)P(d)
(27)
When performing classification each document d is assigned to the class cj with the highest
probability
argmaxcjP(cj)P(d|cj)
P(d)(28)
7
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
The probability P(d) is usually removed from the equation as it is equal for all classes and thus
does not influence the final result Classes could simply represent for example relevant or irrelevant
documents
In order to generate good probabilities the Naive Bayes classifier assumes that P(d|cj) is deter-
mined based on individual word occurrences rather than the document as a whole This simplifica-
tion is needed due to the fact that it is very unlikely to see the exact same document more than once
Without it the observed data would not be enough to generate good probabilities Although this sim-
plification clearly violates the conditional independence assumption since terms in a document are
not theoretically independent from each other experiments show that the Naive Bayes classifier has
very good results when classifying text documents Two different models are commonly used when
working with the Naive Bayes classifier The first typically referred to as the multivariate Bernoulli
event model encodes each word as a binary attribute This encoding relates to the appearance of
words in a document The second typically referred to as the multinomial event model identifies the
number of times the words appear in the document These models see the document as a vector
of values over a vocabulary V and they both lose the information about word order Empirically
the multinomial Naive Bayes formulation was shown to outperform the multivariate Bernoulli model
especially for large vocabularies [9] This model is represented by the following equation
P(cj |di) = P(cj)prod
wisinVdi
P(tk|cj)N(ditk) (29)
In the formula N(ditk) represents the number of times the word or term tk appeared in document di
Therefore only the words from the vocabulary V that appear in the document wisinVdi are used
Decision trees and nearest neighbor methods are other examples of important learning algo-
rithms used in content-based recommendation systems Decision tree learners build a decision tree
by recursively partitioning training data into subgroups until those subgroups contain only instances
of a single class In the case of a document the treersquos internal nodes represent labelled terms
Branches originating from them are labelled according to tests done on the weight that the term
has in the document Leaves are then labelled by categories Instead of using weights a partition
can also be formed based on the presence or absence of individual words The attribute selection
criterion for learning trees for text classification is usually the expected information gain [10]
Nearest neighbor algorithms simply store all training data in memory When classifying a new
unlabeled item the algorithm compares it to all stored items using a similarity function and then
determines the nearest neighbor or the k nearest neighbors The class label for the unclassified
item is derived from the class labels of the nearest neighbors The similarity function used by the
8
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
algorithm depends on the type of data The Euclidean distance metric is often chosen when working
with structured data For items represented using the VSM cosine similarity is commonly adopted
Despite their simplicity nearest neighbor algorithms are quite effective The most important drawback
is their inefficiency at classification time due to the fact that they do not have a training phase and all
the computation is made during the classification time
These algorithms represent some of the most important methods used in content-based recom-
mendation systems A thorough review is presented in [5 7] Despite their popularity content-based
recommendation systems have several limitations These methods are constrained to the features
explicitly associated with the recommended object and when these features cannot be parsed au-
tomatically by a computer they have to be assigned manually which is often not practical due to
limitations of resources Recommended items will also not be significantly different from anything
the user has seen before Moreover if only items that score highly against a userrsquos profile can be
recommended the similarity between them will also be very high This problem is typically referred
to as overspecialization Finally in order to obtain reliable recommendations with implicit user pro-
files the user has to rate a sufficient number of items before the content-based recommendation
system can understand the userrsquos preferences
212 Collaborative Methods
Collaborative methods or collaborative filtering systems try to predict the utility of items for a par-
ticular user based on the items previously rated by other users This approach is also known as the
wisdom of the crowd and assumes that users who had similar tastes in the past will have similar
tastes in the future In order to better understand the usersrsquo tastes or preferences the system has
to be given item ratings either implicitly or explicitly
Collaborative methods are currently the most prominent approach to generate recommendations
and they have been widely used by large commercial websites With the existence of various algo-
rithms and variations these methods are very well understood and applicable in many domains
since the change in item characteristics does not affect the method used to perform the recom-
mendation These methods can be grouped into two general classes [11] namely memory-based
approaches (or heuristic-based) and model-based methods Memory-based algorithms are essen-
tially heuristics that make rating predictions based on the entire collection of previously rated items
by users In user-to-user collaborative filtering when predicting the rating of an unknown item p for
user c a set of ratings S is used This set contains ratings for item p obtained from other users
who have already rated that item usually the N most similar to user c A simple example on how to
generate a prediction and the steps required to do so will now be described
9
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Table 21 Ratings database for collaborative recommendationItem1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 User1 3 1 2 3 3User2 4 3 4 3 5User3 3 3 1 5 4User4 1 5 5 2 1
Table 21 contains a set of users with five items in common between them namely Item1 to
Item5 We have that Item5 is unknown to Alice and the recommendation system needs to gener-
ate a prediction The set of ratings S previously mentioned represents the ratings given by User1
User2 User3 and User4 to Item5 These values will be used to predict the rating that Alice would
give to Item5 In the simplest case the predicted rating is computed using the average of the values
contained in set S However the most common approach is to use the weighted sum where the level
of similarity between users defines the weight value to use when computing the rating For example
the rating given by the user most similar to Alice will have the highest weight when computing the
prediction The similarity measure between users is used to simplify the rating estimation procedure
[12] Two users have a high similarity value when they both rate the same group of items in an iden-
tical way With the cosine similarity measure two users are treated as two vectors in m-dimensional
space where m represents the number of rated items in common The similarity measure results
from computing the cosine of the angle between the two vectors
Similarity(a b) =sum
sisinS rasrbsradicsumsisinS r
2as
radicsumsisinS r
2bp
(210)
In the formula rap is the rating that user a gave to item p and rbp is the rating that user b gave
to the same item p However this measure does not take into consideration an important factor
namely the differences in rating behaviour are not considered
In Figure 22 it can be observed that Alice and User1 classified the same group of items in a
similar way The difference in rating values between the four items is practically consistent With
the cosine similarity measure these users are considered highly similar which may not always be
the case since only common items between them are contemplated In fact if Alice usually rates
items with low values we can conclude that these four items are amongst her favourites On the
other hand if User1 often gives high ratings to items these four are the ones he likes the least It
is then clear that the average ratings of each user should be analyzed in order to considerer the
differences in user behaviour The Pearson correlation coefficient is a popular measure in user-
based collaborative filtering that takes this fact into account
10
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 22 Comparing user ratings [2]
sim(a b) =
sumsisinS(ras minus ra)(rbs minus rb)radicsum
sisinS(ras minus ra)2sum
sisinS(rbs minus rb)2(211)
In the formula ra and rb are the average ratings of user a and user b respectively
With the similarity values between Alice and the other users obtained using any of these two
similarity measures we can now generate a prediction using a common prediction function
pred(a p) = ra +
sumbisinN sim(a b) lowast (rbp minus rb)sum
bisinN sim(a b)(212)
In the formula pred(a p) is the prediction value to user a for item p and N is the set of users
most similar to user a that rated item p This function calculates if the neighborsrsquo ratings for Alicersquos
unseen Item5 are higher or lower than the average The rating differences are combined using the
similarity scores as a weight and the value is added or subtracted from Alicersquos average rating The
value obtained through this procedure corresponds to the predicted rating
Different recommendation systems may take different approaches in order to implement user
similarity calculations and rating estimations as efficiently as possible According to [12] one com-
mon strategy is to calculate all user similarities sim(ab) in advance and recalculate them only once
in a while since the network of peers usually does not change dramatically in a short period of
time Then when a user requires a recommendation the ratings can be efficiently calculated on
demand using the precomputed similarities Many other performance-improving modifications have
been proposed to extend the standard correlation-based and cosine-based techniques [11 13 14]
11
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
The techniques presented above have been traditionally used to compute similarities between
users Sarwar et al proposed using the same cosine-based and correlation-based techniques
to compute similarities between items instead latter computing ratings from them [15] Empiri-
cal evidence has been presented suggesting that item-based algorithms can provide with better
computational performance comparable or better quality results than the best available user-based
collaborative filtering algorithms [16 15]
Model-based algorithms use a collection of ratings (training data) to learn a model which is then
used to make rating predictions Probabilistic approaches estimate the probability of a certain user
c giving a particular rating to item s given the userrsquos previously rated items This estimation can be
computed for example with cluster models where like-minded users are grouped into classes The
model structure is that of a Naive Bayesian model where the number of classes and parameters of
the model are leaned from the data Other collaborative filtering methods include statistical models
linear regression Bayesian networks or various probabilistic modelling techniques amongst others
The new user problem also known as the cold start problem also occurs in collaborative meth-
ods The system must first learn the userrsquos preferences from previously rated items in order to
perform accurate recommendations Several techniques have been proposed to address this prob-
lem Most of them use the hybrid recommendation approach presented in the next section Other
techniques use strategies based on item popularity item entropy user personalization and combi-
nations of the above [12 17 18] New items also present a problem in collaborative systems Until
the new item is rated by a sufficient number of users the recommender system will not recommend
it Hybrid methods can also address this problem Data sparsity is another problem that should
be considered The number of rated items is usually very small when compared to the number of
ratings that need to be predicted User profile information like age gender and other attributes can
also be used when calculating user similarities in order to overcome the problem of rating sparsity
213 Hybrid Methods
Content-based and collaborative methods have many positive characteristics but also several limita-
tions The idea behind hybrid systems [19] is to combine two or more different elements in order to
avoid some shortcomings and even reach desirable properties not present in individual approaches
Monolithic parallel and pipelined approaches are three different hybridization designs commonly
used in hybrid recommendation systems [2]
In monolithic hybridization (Figure 23) the components of each recommendation strategy are
not architecturally separate The objective behind this design is to exploit different features or knowl-
edge sources from each strategy to generate a recommendation This design is an example of
content-boosted collaborative filtering [20] where social features (eg movies liked by user) can be
12
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 23 Monolithic hybridization design [2]
Figure 24 Parallelized hybridization design [2]
Figure 25 Pipelined hybridization designs [2]
associated with content features (eg comedies liked by user or dramas liked by user) in order to
improve the results
Parallelized hybridization (Figure 24) is the least invasive design given that the recommendation
components have the same input as if they were working independently A weighting or a voting
scheme is then applied to obtain the recommendation Weights can be assigned manually or learned
dynamically This design can be applied for two components that perform good individually but
complement each other in different situations (eg when few ratings exist one should recommend
popular items else use collaborative methods)
Pipelined hybridization designs (Figure 25) implement a process in which several techniques
are used sequentially to generate recommendations Two types of strategies are used [2] cascade
and meta-level Cascade hybrids are based on a sequenced order of techniques in which each
succeeding recommender only refines the recommendations of its predecessor In a meta-level
hybridization design one recommender builds a model that is exploited by the principal component
to make recommendations
In the practical development of recommendation systems it is well accepted that all base algo-
rithms can be improved by being hybridized with other techniques It is important that the recom-
mendation techniques used in the hybrid system complement each otherrsquos limitations For instance
13
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 26 Popular evaluation measures in studies about recommendation systems from the areaof Computer Science (CS) or the area of Information Systems (IS) [4]
contentcollaborative hybrids regardless of type will always demonstrate the cold-start problem
since both techniques need a database of ratings [19]
22 Evaluation Methods in Recommendation Systems
Recommendation systems can be evaluated from numerous perspectives For instance from the
business perspective many variables can and have been studied increase in number of sales
profits and item popularity are some example measures that can be applied in practice From the
platform perspective the general interactivity with the platform and click-through-rates can be anal-
ysed From the customer perspective satisfaction levels loyalty and return rates represent valuable
feedback Still as shown in Figure 26 the most popular forms of evaluation in the area of rec-
ommendation systems are based on Information Retrieval (IR) measures such as Precision and
Recall[2]
When using IR measures the recommendation is viewed as an information retrieval task where
recommended items like retrieved items are predicted to be good or relevant Items are then
classified with one of four possible states as shown on Figure 27 Correct predictions also known
as true positives (tp) occur when the recommended item is liked by the user or established as
ldquoactually goodrdquo by a human expert in the item domain False negatives (fn) represent items liked by
the user that were not recommended by the system False positives (fp) designate recommended
items disliked by the user Finally correct omissions also known as true negatives (tn) represent
items correctly not recommended by the system
Precision measures the exactness of the recommendations ie the fraction of relevant items
recommended (tp) out of all recommended items (tp+ fp)
14
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 27 Evaluating recommended items [2]
Precision =tp
tp+ fp(213)
Recall measures the completeness of the recommendations ie the fraction of relevant items
recommended (tp) out of all relevant recommended items (tp+ fn)
Recall =tp
tp+ fn(214)
Measures such as the Mean Absolute Error (MAE) or the Root Mean Square Error (RMSE) are
also very popular in the evaluation of recommendation systems capturing the accuracy at the level
of the predicted ratings MAE computes the deviation between predicted ratings and actual ratings
MAE =1
n
nsumi=1
|pi minus ri| (215)
In the formula n represents the total number of items used in the calculation pi the predicted rating
for item i and ri the actual rating for item i RMSE is similar to MAE but places more emphasis on
larger deviations
RMSE =
radicradicradicradic 1
n
nsumi=1
(pi minus ri)2 (216)
15
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
The RMSE measure was used in the famous Netflix competition1 where a prize of $1000000
would be given to anyone who presented an algorithm with an accuracy improvement (RMSE) of
10 compared to Netflixrsquos own recommendation algorithm at the time called Cinematch
1httpwwwnetflixprizecom
16
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Chapter 3
Related Work
This chapter presents a brief review of four previously proposed recommendation approaches The
works described in this chapter contain interesting features to further explore in the context of per-
sonalized food recommendation using content-based methods
31 Food Preference Extraction for Personalized Cooking Recipe
Recommendation
Based on userrsquos preferences extracted from recipe browsing (ie from recipes searched) and cook-
ing history (ie recipes actually cooked) the system described in [3] recommends recipes that
score highly regarding the userrsquos favourite and disliked ingredients To estimate the userrsquos favourite
ingredients I+k an equation based on the idea of TF-IDF is used
I+k = FFk times IRF (31)
FFk is the frequency of use (Fk) of ingredient k during a period D
FFk =Fk
D(32)
The notion of IDF (inverse document frequency) is specified in Eq(44) through the Inverse Recipe
Frequency IRFk which uses the total number of recipes M and the number of recipes that contain
ingredient k (Mk)
IRFk = logM
Mk(33)
17
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
The userrsquos disliked ingredients Iminusk are estimated by considering the ingredients in the browsing
history with which the user has never cooked
To evaluate the accuracy of the system when extracting the userrsquos preferences 100 recipes were
used from Cookpad1 one of the most popular recipe search websites in Japan with one and half
million recipes and 20 million monthly users From a set of 100 recipes a list of 10 recipe titles was
presented each time and subjects would choose one recipe they liked to browse completely and one
recipe they would like to cook This process was repeated 10 times until the set of 100 recipes was
exhausted The labelled data for usersrsquo preferences was collected via a questionnaire Responses
were coded on a 6-point scale ranging from love to hate To evaluate the estimation of the userrsquos
favourite ingredients the accuracy precision recall and F-measure for the top N ingredients sorted
by I+k was computed The F-measure is computed as follows
F-measure =2times PrecisiontimesRecallPrecision+Recall
(34)
When focusing on the top ingredient (N = 1) the method extracted the userrsquos favourite ingredient
with a precision of 833 However the recall was very low namely of only 45 The following
values for N were tested 1 3 5 10 15 and 20 When focusing on the top 20 ingredients (N = 20)
although the precision dropped to 607 the recall increased to 61 since the average number
of individual userrsquos favourite ingredients is 192 Also with N = 20 the highest F-measure was
recorded with the value of 608 The authors concluded that for this specific case the system
should focus on the top 20 ingredients sorted by I+k for recipe recommendation The extraction of
the userrsquos disliked ingredients is not explained here with more detail because the accuracy values
obtained from the evaluation where not satisfactory
In this work the recipesrsquo score is determined by whether the ingredients exist in them or not
This means that two recipes composed by the same set of ingredients have exactly the same score
even if they contain different ingredient proportions This method does not correspond to real eating
habits eg if a specific user does not like the ingredient k contained in both recipes the recipe
with higher quantity of k should have a lower score To improve this method an extension of this
work was published in 2014 [21] using the same methods to estimate the userrsquos preferences When
performing a recommendation the system now also considered the ingredient quantity of a target
recipe
When considering ingredient proportions the impact on a recipe of 100 grams from two different
1httpcookpadcom
18
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
ingredients can not be considered equivalent ie 100 grams of pepper have a higher impact on a
recipe than 100 grams of potato as the variation from the usual observed quantity of the ingredient
pepper is higher Therefore the scoring method proposed in this work is based on the standard
quantity and dispersion quantity of each ingredient The standard deviation of an ingredient k is
obtained as follows
σk =
radicradicradicradic 1
n
nsumi=1
(gk(i) minus gk)2 (35)
In the formula n denotes the number of recipes that contain ingredient k gk(i) denotes the quan-
tity of the ingredient k in recipe i and gk represents the average of gk(i) (ie the previously computed
average quantity of the ingredient k in all the recipes in the database) According to the deviation
score a weight Wk is assigned to the ingredient The recipersquos final score R is computed considering
the weight Wk and the userrsquos liked and disliked ingredients Ik (ie I+k and Iminusk respectively)
Score(R) =sumkisinR
(Ik middotWk) (36)
The approach inspired on TF-IDF shown in equation Eq(31) used to estimate the userrsquos favourite
ingredients is an interesting point to further explore in restaurant food recommendation In Chapter
4 a possible extension to this method is described in more detail
32 Content-Boosted Collaborative Recommendation
A previous article presents a framework that combines content-based and collaborative methods
[20] showing that Content-Boosted Collaborative Filtering (CBCF) performs better than a pure
content-based predictor a pure collaborative filtering method and a naive hybrid of the two It is
also shown that CBCF overcomes the first-rater problem from collaborative filtering and reduces
significantly the impact that sparse data has on the prediction accuracy The domain of movie rec-
ommendations was used to demonstrate this hybrid approach
In the pure content-based method the prediction task was treated as a text-categorization prob-
lem The movie content information was viewed as a document and the user ratings between 0
and 5 as one of six class labels A naive Bayesian text classifier was implemented and extended to
represent movies Each movie is represented by a set of features (eg title cast etc) where each
feature is viewed as a set of words The classifier was used to learn a user profile from a set of rated
19
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
movies ie labelled documents Finally the learned profile is used to predict the label (rating) of
unrated movies
The pure collaborative filtering method implemented in this study uses a neighborhood-based
algorithm User-to-user similarity is computed with the Pearson correlation coefficient and the pre-
diction is computed with the weighted average of deviations from the neighborrsquos mean Both these
methods were explained in more detail in Section 212
The naive hybrid approach uses the average of the ratings generated by the pure content-based
predictor and the pure collaborative method to generate predictions
CBCF basically consists in performing a collaborative recommendation with less data sparsity
This is achieved by creating a pseudo user-ratings vector vu for every user u in the database The
pseudo user-ratings vector consists of the item ratings provided by the user u where available and
those predicted by the content-based method otherwise
vui =
rui if user u rated item i
cui otherwise(37)
Using the pseudo user-ratings vectors of all users the dense pseudo ratings matrix V is created
The similarity between users is then computed with the Pearson correlation coefficient The accuracy
of a pseudo user-ratings vector depends on the number of movies the user has rated If the user
has rated many items the content-based predictions are significantly better than if he has only a
few rated items Therefore the accuracy of the pseudo user-rating vector clearly depends on the
number of items the user has rated Lastly the prediction is computed using a hybrid correlation
weight that allows similar users with more accurate pseudo vectors to have a higher impact on the
predicted rating The hybrid correlation weight is explained in more detail in [20]
The MAE was one of two metrics used to evaluate the accuracy of a prediction algorithm The
content-boosted collaborative filtering system presented the best results with a MAE of 0962
The pure collaborative filtering and content-based methods presented MAE measures of 1002 and
1059 respectively The MAE value of the naive hybrid approach was of 1011
CBCF is an important approach to consider when looking to overcome the individual limitations
of collaborative filtering and content-based methods since it has been shown to perform consistently
better than pure collaborative filtering
20
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 31 Recipe - ingredient breakdown and reconstruction
33 Recommending Food Reasoning on Recipes and Ingredi-
ents
A previous article has studied the applicability of recommender techniques in the food and diet
domain [22] The performance of collaborative filtering content-based and hybrid recommender al-
gorithms is evaluated based on a dataset of 43000 ratings from 512 users Although the main focus
of this article is the content or ingredients of a meal various other variables that impact a userrsquos
opinion in food recommendation are mentioned These other variables include cooking methods
ingredient costs and quantities preparation time and ingredient combination effects amongst oth-
ers The decomposition of recipes into ingredients (Figure 31) implemented in this experiment is
simplistic ingredient scores were computed by averaging the ratings of recipes in which they occur
As a baseline algorithm random recommender was implemented which assigns a randomly
generated prediction score to a recipe Five different recommendation strategies were developed for
personalized recipe recommendations
The first is a standard collaborative filtering algorithm assigning predictions to recipes based
on the weighted ratings of a set of N neighbors The second is a content-based algorithm which
breaks down the recipe to ingredients and assigns ratings to them based on the userrsquos recipe scores
Finnaly with the userrsquos ingredient scores a prediction is computed for the recipe
Two hybrid strategies were also implemented namely hybrid recipe and hybrid ingr Both use a
simple pipelined hybrid design where the content-based approach provides predictions to missing
ingredient ratings in order to reduce the data sparsity of the ingredient matrix This matrix is then
used by the collaborative approach to generate recommendations These strategies differentiate
from one another by the approach used to compute user similarity The hybrid recipe method iden-
tifies a set of neighbors with user similarity based on recipe scores In hybrid ingr user similarity
is based on the recipe ratings scores after the recipe is broken down Lastly an intelligent strategy
was implemented In this strategy only the positive ratings for items that receive mixed ratings are
considered It is considered that common items in recipes with mixed ratings are not the cause of the
high variation in score The results of the study are represented in Figure 32 using the normalized
21
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 32 Normalized MAE score for recipe recommendation [22]
MAE as an evaluation metric
This work shows that the content-based approach in this case has the best overall performance
with a significant accuracy improvement over the collaborative filtering algorithm Furthermore the
authors concluded that this work implemented a simplistic version of what a recipe recommender
needs to achieve As mentioned earlier there are many other factors that influence a userrsquos rating
that can be considered to improve content-based food recommendations
34 User Modeling for Adaptive News Access
Similarity is an important subject in many recommendation methods Still similar items are not
the only ones that matter when calculating a prediction In some cases items that are too similar
to others which have already been seen are not to be recommended as well This idea is used
in Daily-Learner [23] a well-known news article content-based recommendation system When
helping the user to obtain more knowledge about a news topic a certain variety should exist when
performing the recommendation Items too similar to others known by the user probably carry the
same information and will not help him to gather more information about a particular news topic
These items are then excluded from the recommendation On the other hand items similar in topic
but not similar in content should be great recommendations in the context of this system Therefore
the use of similarity can be adjusted according to the objectives of the recommendation system
In order to identify the current user interests Daily-Learner uses the nearest neighbor algorithm
to model the usersrsquo short-term interests As previously mentioned in Section 211 nearest neighbor
algorithms simply store all training data in memory When classifying a new unlabeled item the
algorithm compares it to all stored items using a similarity function and then determines the nearest
neighbor or the k nearest neighbors Daily-Learner uses this method as it can quickly adapt to a
userrsquos novel interests The main advantage of the nearest-neighbor approach is that only a single
story of a new topic is needed to allow the algorithm to identify future follow-up stories
22
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Stories in Daily-Learner are converted to TF-IDF vectors and the cosine similarity measure is
used to quantify the similarity between two vectors When computing a prediction for a new story all
the stories that are closer than a minimum threshold (eg a minimum similarity value) to the story to
be classified become voting stories The predicted score is then computed as the weighted average
over all the voting storiesrsquo scores using the similarity values as the weights If a voter is closer than
a maximum threshold (eg a maximum similarity value) to the new story the story is labeled as
known because the system assumes that the user is already aware of the event reported in it and
does not need to recommend a story he already knows If the story does not have any voters it
cannot be classified by the short-term model and is passed to the long-term model explained with
more detail in [23]
This issue should be taken into consideration in food recommendations as usually users are not
interested in recommendations with contents too similar to dishes recently eaten
23
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
24
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Chapter 4
Architecture
In this chapter the modules that compose the architecture of the recommendation system are pre-
sented First an introduction to the recommendation module is made followed by the specification
of the methods used in the different recommendation components Afterwards the datasets chosen
to validate this work are analyzed and the database platform is described
The recommendation system contains three recommendation components (Fig 41) the YoLP
collaborative recommender the YoLP content-based recommender and an experimental recommen-
dation component where various approaches are explored to adapt the Rocchiorsquos algorithm for per-
sonalized food recommendations These provide independent recommendations for the same input
in order to evaluate improvements in the prediction accuracy from the algorithms implemented in the
experimental component The evaluation module independently evaluates each recommendation
component by measuring the performance of the algorithms using different metrics The methods
used in this module are explained in detail in the following chapter The programming language used
to develop these components was Python1
41 YoLP Collaborative Recommendation Component
The collaborative recommendation component implemented in YoLP uses an item-to-item collabo-
rative approach [24] This approach is very similar to the user-to-user approach explained in detail
in Section 212
In user-to-user the similarity value between a pair of users is measured by the way both users
rate the same set of items where in item-to-item approach the similarity value between a pair items
is measured from the way they are rated by a shared set of users In other words in user-to-user
two users are considered similar if they rate the same set of items in a similar way where in the
item-to-item approach two items are considered similar if they were rated in a similar way by the
1httpswwwpythonorg
25
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 41 System Architecture
Figure 42 Item-to-item collaborative recommendation2
same group of users
The usual formula for computing item-to-item similarity is the Person correlation defined as
sim(a b) =
sumpisinP (rap minus ra)(rbp minus rb)radicsum
pisinP (rap minus ra)2sum
pisinP (rbp minus rb)2(41)
where a and b are recipes rap is the rating from user p to recipe a P is the group of users
that rated both recipe a and recipe b and lastly ra and rb are recipe a and recipe b average ratings
respectively
After the similarity is computed the rating prediction is calculated using the following equation
pred(u a) =sum
bisinN sim(a b) lowast (rbp minus rb)sumbisinN sim(a b)
(42)
2httpwwwcscarletoneducs_comps0607recommendrecommenderitembasedhtml
26
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
In the formula pred(u a) is the prediction value to user u for item a and N is the set of items
rated by user u Using the set of the userrsquos rated items the user rating for each item b is weighted
according to the similarity between b and the target item a The predicted rating is computed by the
sum of similarities
Item based approach was chosen in the YoLP collaborative recommendation component be-
cause it is computationally more efficient when recommending a fixed group of recipes Recom-
mendations in YoLP are limited to the restaurant recipes where the user is located so it is simpler
to measure the similarity between the userrsquos rated recipes and the restaurant recipes and compute
the predicted ratings from there Another reason why the item based collaborative approach was
chosen was already mentioned in Section 212 empirical evidence has been presented suggest-
ing that item-based algorithms can provide with better computational performance comparable or
better quality results than the best available user-based collaborative filtering algorithms [16 15]
42 YoLP Content-Based Recommendation Component
The YoLP content-based component generates recommendations by comparing the restaurantrsquos
recipesrsquo features with the user profile using the cosine similarity measure (Eq 25) The recom-
mended recipes are ordered from most to least similar In this case instead of referring recipes as
vectors of words recipes are represented by vectors of different features The features that compose
a recipe are category region restaurant ID and ingredients Context features are also considered
in the moment of the recommendation these are temperature period of the day and season of the
year Each feature has a specific location attributed to it in the recipe and user profile sparse vectors
The user profile is composed by binary values of the recipe features that he positively rated ie
when a user rates a recipe with a value of 4 or 5 all the recipe features are added as binary values
to the profile vector
YoLP recipe recommendations take the form of a list and in order to use Epicurious and Foodcom
datasets to validate the algorithms a rating value is needed In collaborative recommendation the
list is ordered by the predicted ratings so the MAE and RMSE measures can be directly calculated
However in the content-based method the recipes are ordered by the similarity values between the
recipe feature vector and the user profile vector In order to transform the similarity measure into a
rating the combined user and item average was used The formula applied was the following
Rating =
avgTotal + 05 if similarity gt 08
avgTotal otherwise(43)
27
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Where avgTotal represents the combined user and item average for each recommendation So it
is important to notice that the test results presented in chapter 5 for the YoLP content-based method
are an approximation to the real values since it is likely that this method of transforming a similarity
measure into a rating introduces a small error in the results Another approximation is the fact that
YoLP considers context features in the moment of the recommendation and these are not included
in the Epicurious and Foodcom datasets as will be explained in further detail in Section 44
43 Experimental Recommendation Component
This component represents the main focus of this work It implements variations of content-based
methods using the well-known Rocchio algorithm The idea of considering ingredients in a recipe
as similar to words in a document lead to the variation of TF-IDF weights developed in [3] This
work presented good results in retrieving the userrsquos favourite ingredients which raised the following
question could these results be further improved As previously mentioned the TF-IDF scheme
can be used to attribute weights to words when using the popular Rocchio algorithm Instead of
simply obtaining the usersrsquo favorite ingredients using the TF-IDF variation [3] the userrsquos overall
preference in ingredients could be estimated through the prototype vector which represents the
learning in Rocchiorsquos algorithm These vectors would contain the users preferences where the
positive and negative examples are obtained directly from the userrsquos rated recipesdishes In this
section the method used to compute the features weights to be used in the Rocchiorsquos algorithm
is presented Next two different approaches are introduced to build the usersrsquo prototype vectors
and lastly the problem of transforming a similarity measure into a rating value is presented and the
solutions explored in this work are detailed
431 Rocchiorsquos Algorithm using FF-IRF
As mentioned in Section 31 of the related work the approach inspired on TF-IDF shown in equation
Eq(31) used to estimate the userrsquos favourite ingredients is an interesting point to further explore
in food recommendation Since Rocchiorsquos algorithm uses feature weights to build the prototype
vectors representing the userrsquos preferences and FF-IRF has shown good results for extracting the
userrsquos favourite ingredients this measure could be used to attribute weights to the recipersquos features
and build the prototype vectors In this work the Frequency of use of the feature Fk is assumed to
be always 1 The main reason is the inexistence of timestamps in the datasetrsquos reviews which does
not allow to determine the number of times that a feature is preferred during a period D The Inverse
Recipe Frequency is used exactly as mentioned in [3]
28
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
IRFk = logM
Mk(44)
Where M is the total number of recipes and Mk is the number of recipes that contain ingredient
k Essentially all the recipesrsquo features weights were computed using only the IRF determined by the
complete dataset
432 Building the Usersrsquo Prototype Vector
The prototype vector is built directly from the userrsquos rated items The type of observation positive
or negative and the weight attributed to each determines the impact that a rated recipe has on the
user prototype vector In the experiments performed in this work positive and negative observations
have an equal weight of 1 In order to determine if a rating event is considered a positive or negative
observation two different approaches were studied The first approach was simple the lower rating
values are considered a negative observation and the higher rating values are positive observations
In the Epicurious dataset ratings vary from 1 to 4 so 1 and 2 are considered negative observations
and 3 and 4 are positive observations In the Foodcom dataset ratings range from 1 to 5 the same
process is applied to this dataset with the exception of ratings equal to 3 in this case these are
considered neutral observations and are ignored Both datasets used in the experiments will be
explained in detail in the next Section 44 The second approach utilizes the userrsquos average rating
value computed from the training set If a rating event is lower then the userrsquos average rating it is
considered a negative observation and if it is equal or higher it is considered a positive observation
As it was explained in detail in Section 211 the prototype vector represents the userrsquos prefer-
ences These are directly obtained from the rating events contained in the training set Depending
on the observation the recipersquos features weights are added or subtracted on the user prototype vec-
tor In positive observations the recipersquos features weights determined by the IRF value are added
to the vector In negative observations the features weights are subtracted
433 Generating a rating value from a similarity value
Datasets that contain user reviews on recipes or restaurant meals are presently very hard to find
Epicurious and Foodcom which will be presented in the next section are food related datasets
with relevant information on the recipes that contain rating events from users to recipes In order to
validate the methods explored in this work the recommendation system also needs to return a rating
value This problem was already mentioned when YoLP content-based component was presented
Rocchiorsquos algorithm returns a similarity value between the recipe features vector and the user profile
29
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
vector so a method is needed to translate the similarity into a rating This topic is very important to
explore since it can introduce considerate errors in the validation results Next two approaches are
presented to translate the similarity value into a rating
Min-Max method
The similarity values needed to fit into a specific range of rating values There are many types of
normalization methods available the technique chosen for this work was Min-Max Normalization
Min-Max transforms a value A to B which fits in the range [CD] as shown in the following formula3
B =Aminusminimum value ofA
maximum value ofAminusminimum value ofAlowast (D minus C) + C (45)
In order the obtain the best results the similarity and rating scales were computed individually
for each user since not all users rate items the same way or have the same notion of high or low
rating values So the following steps were applied compute all the usersrsquo similarity variation from
the validation set and compute all the usersrsquo rating variation from the training set At this point
the similarity scale is mapped for each user into the rating range and the Min-Max Normalization
formula (Eq 45) can be applied to predict a rating value for the recipe to recommend In the cases
where there were not enough user ratings to compute the similarity interval (maximum value of A
ndash minimum value of A) the user average was used as default for the recommendation
Using average and standard deviation values from training set
Using the average and standard deviation values from the training set should in theory bring good
results and introduce only a very small error To generate a rating value following formula was used
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
(46)
Three different approaches were tested using the userrsquos rating average and the user standard
deviation using the recipersquos rating average and the recipe standard deviation and using the com-
bined average of the user and the recipe averages and standard deviations
This approach is very intuitive when the similarity value between the recipersquos features and the
3httpaimotionblogspotpt200909data-mining-in-practicehtml
30
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Table 41 Statistical characterization for the datasets used in the experiments
Foodcom Epicurious Foodcom EpicuriousNumber of users 24741 8117 Sparsity on the ratings matrix 002 007Number of food items 226025 14976 Avg rating values 468 334Number of rating events 956826 86574 Avg number of ratings per user 3867 1067Number of ratings above avg 726467 46588 Avg number of ratings per item 423 578Number of groups 108 68 Avg number of ingredients per item 857 371Number of ingredients 5074 338 Avg number of categories per item 233 060Number of categories 28 14 Avg number of food groups per item 087 061
user profile is high then the recipersquos features are similar to the userrsquos preferences which should
yield a higher rating value to the recipe Since the notion of a high rating value varies between
users and recipes their averages and standard deviation can help determine with more accuracy
the final rating recommended for the recipe Later on in Chapter 5 the upper and lower similarity
thresholds values used in this method U and L respectively will be optimized to obtain the best
recommendation performances but initially the upper threshold U is 075 and the lower threshold L
025
44 Database and Datasets
The database represented in the system architecture stores all the data required by the recommen-
dation system in order to generate recommendations The data for the experiments is provided by
two datasets The first dataset was previously made available by [25] collected from a large on-
line4 recipe sharing community The second dataset is composed by crawled data obtained from a
website named Epicurious5 This dataset initially contained 51324 active users and 160536 rated
recipes but in order to reduce data sparsity the dataset has been filtered All recipes that are rated
no more than 3 times were removed as well as the users who rate no more than 5 times In table
41 a statistical characterization for the two datasets is presented after the filter was applied
Both datasets contain user reviews for specific recipes where each recipe is characterized by the
following features ingredients cuisine and dietary Here are some examples of these features
bull Ingredients Bean Cheese Nut Fish Potato Pepper Basil Eggplant Chicken Pasta Poultry
Bacon Tomato Avocado Shrimp Rice Pork Shellfish Peanut Turkey Spinach Scallop
Lamb Mint Wine Garlic Beef Citrus Onion Pear Egg Pecan Apple etc
bull Cuisines Mediterranean French SpanishPortuguese Asian North American Chinese Cen-
tralSouth American European Mexican Latin American American Greek Indian German
Italian etc
4httpwwwfoodcom5httpwwwepicuriouscom
31
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 43 Distribution of Epicurious rating events per rating values
Figure 44 Distribution of Foodcom rating events per rating values
bull Dietaries Vegetarian Low Cal Healthy High Fiber WheatGluten-Free Low Sodium LowNo
Sugar Low Carb Low Fat Vegan Low Cholesterol Raw Kosher etc
A recipe can have multiple cuisines dietaries and as expected multiple ingredients attributed to
it The main difference between the recipersquos features in these datasets is the way that ingredients are
represented In Foodcom recipes are characterized by all the ingredients that compose it where in
Epicurious only the main ingredients are considered The main ingredients for a recipe are chosen
by the web site users when performing a review
In Figures 43 44 and 45 some graphical statistical data of the datasets is presented Figure
43 and 44 displays the distribution of the rating events per rating values for each dataset Figure 45
shows the distribution of the number of users per number of rated items for the Epicurious dataset
This last graph is not presented for the Foodcom dataset because its curve would be very similar
since a decrease in the number of users when the number of rated items increases is a normal
characteristic of rating event datasets
32
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 45 Epicurious distribution of the number of ratings per number of users
The database used to store the data is MySQL6 Being a relational database MySQL is excel-
lent for representing and working with structured sets of data which is perfectly adequate for the
objectives of this work The database stores all rating events recipe features (ingredients cuisines
and dietaries) and the usersrsquo prototype vectors
6httpwwwmysqlcom
33
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
34
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Chapter 5
Validation
This chapter contains the details and results of the experiments performed in this work First the
evaluation method and evaluation metrics are presented followed by the discussion of the first ex-
perimental results and baselines algorithms In Section 53 a feature test is performed to determine
the features that are crucial for the best recommendations In Section 54 a threshold variation test
is performed to adjust the algorithm and seek improvements in the recommendation results Finally
the last two sections focus on analysing two interesting topics of the recommendation process using
the algorithm that showed the best results
51 Evaluation Metrics and Cross Validation
Cross-validation was used to validate the recommendation components in this work This technique
is mainly used in systems that seek to estimate how accurately a predictive model will perform in
practice [26] The main goal of cross-validation is to isolate a segment of the known data but instead
of using it to train the model this segment is used to evaluate the predictions made by the system
during the training phase This procedure provides an insight on how the model will generalize to an
independent dataset More specifically leave-p-out cross-validation method was used leveraging
p observations as the validation set and the remaining observations as the training set To reduce
variability this process is repeated multiple times using different observations p as the validation set
Ideally this process is repeated until all possible combinations of p are tested The validation results
are averaged over the number of times the process is repeated (see Fig 51) In the experiments
performed in this work the chosen value for p was 5 so the process is repeated 5 times also known
as 5-fold cross-validation For each fold the validation set represents 20 and the training set the
remaining 80 of the data
Accuracy is measured when comparing the known data from the validation set with the outputs of
the system (ie the prediction values) In the simplest case the validation set presents information
35
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 51 10 Fold Cross-Validation example
in the following format
bull User identification userID
bull Item identification itemID
bull Rating attributed by the userID to the itemID rating
By providing the recommendation system with the userID and itemID as inputs the algorithms
generate a prediction value (rating) for that item This value is estimated based on the userrsquos previ-
ously rated items learned from the training set
Using the correct rating values obtained from the validation set and the generated predictions
created by the algorithms the MAE and RMSE measures can be computed As previously men-
tioned in Section 22 these measures compute the deviation between the predicted ratings and the
actual ratings The results obtained from the evaluation module are used to directly compare the
performance of the different recommendation components as well as to validate new variations of
context-based algorithms
52 Baselines and First Results
In order to validate the experimental context-based algorithms explored in this work first some base-
lines need to be computed Using the 5-fold cross-validation the YoLP recommendation components
presented in the previous Chapter (Section 41 and 42) were evaluated Besides these methods a
few simple baselines metrics were also computed using the direct values of specific dataset aver-
ages as the predicted rating for the recommendations The averages computed were the following
user average rating recipe average rating and the combined average of the user and item aver-
ages ie (UserAvg + ItemAvg)2 In other words when receiving the userID and recipeID as
36
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Table 51 Baselines
Epicurious FoodcomMAE RMSE MAE RMSE
YoLP Content-basedcomponent
06389 08279 03590 06536
YoLP Collaborativecomponent
06454 08678 03761 06834
User Average 06315 08338 04077 06207Item Average 07701 10930 04385 07043Combined Average 06628 08572 04180 06250
Table 52 Test Results
Epicurious FoodcomObservationUser Average
ObservationFixed Thresh-old
User AverageObservation
ObservationFixed Thresh-old
MAE RMSE MAE RMSE MAE RMSE MAE RMSEUser Avg + User Stan-dard Deviation
08217 10606 07759 10283 04448 06812 04287 06624
Item Avg + Item Stan-dard Deviation
08914 11550 08388 11106 04561 07251 04507 07207
UserItem Avg + Userand Item Standard De-viation
08304 10296 07824 09927 04390 06506 04324 06449
Min-Max 08539 11533 07721 10705 06648 09847 06303 09384
inputs the recommendation system simply returns the userID average or the recipeID average or
the combination of both Table 51 contains the MAE and RMSE values for the baseline methods
As detailed in Section 43 the experimental recommendation component uses the well-known
Rocchiorsquos algorithm and seeks to adapt it to food recommendations Two distinct ways of building
the userrsquos prototype vectors were presented using the user average rating value as threshold for
positive and negative observations or simply using a fixed threshold in the middle of the rating
range considering as positive observations the highest rating values and as negative the lowest
These are referred in Table 52 as Observation User Average and Observation Fixed Threshold
Also detailed in Section 43 a few different methods are used to convert the similarity value returned
from Rocchiorsquos algorithm into a rating value These methods are represented in the line entries of
the Table 52 and are referred to as User Avg + User Standard Deviation Item Avg + Item Standard
Deviation UserItem Avg + User and Item Standard Deviation and Min-Max
Table 52 contains the first test results of the experiments using 5-fold cross-validation The
objective was to determine which method combination had the best performance so it could be
further adjusted and improved When observing the MAE and RMSE values it is clear that using the
user average as threshold to build the prototype vectors results in higher error values than the fixed
threshold of 3 to separate the positive and negative observations The second conclusion that can
be made from these results is that using the combination of both user and item average ratings and
standard deviations has the overall lowest error values
Although the first results do not surpass most of the baselines in terms of performance the
experimental methods with the best performances were identified and can now be further improved
37
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Table 53 Testing features
Epicurious FoodcomMAE RMSE MAE RMSE
Ingredients + Cuisine +Dietaries
07824 09927 04324 06449
Ingredients + Cuisine 07915 10012 04384 06502Ingredients + Dietary 07874 09986 04342 06468Cuisine + Dietary 08266 10616 04324 07087Ingredients 07932 10054 04411 06537Cuisine 08553 10810 05357 07431Dietary 08772 10807 04579 07320
and adjusted to return the best recommendations
53 Feature Testing
As detailed in Section 44 each recipe is characterized by the following features ingredients cuisine
and dietary In content-based methods it is important to determine if all features are helping to obtain
the best recommendations so feature testing is crucial
In the previous Section we concluded that the method combination that performed the best was
the following
bull Use the rating value 3 as a fixed threshold to distinguish positive and negative observations
and build the prototype vectors
bull Use the combination of both user and item average ratings and standard deviations to trans-
form the similarity value into a rating value
From this point on all the experiments performed use this method combination
Computing the userrsquos prototype vectors for all 5 folds is a time consuming process especially
for the Foodcom dataset With these feature tests in mind the goal was to avoid rebuilding the
prototype vectors for each feature combination to be tested so when computing the user prototype
vector the features where separated and in practice 3 vectors were created and stored for each
user This representation makes feature testing very easy to perform For each recommendation
when computing the cosine similarity between the userrsquos prototype vector and the recipersquos features
the composition of the prototype vector can be controlled as the 3 stored vectors can be easily
merged In the tests presented in the previous section the prototype vector was built using all
features (Ingredients + Cuisine + Dietaries) so the same results can be observed in the respective
line of Table 53
Using more features to describe the items in content-based methods should in theory improve
the recommendations since we have more information available about them and although this is
confirmed in this test see Table 53 that may not always be the case Some features like for
38
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 52 Lower similarity threshold variation test using Epicurious dataset
example the price of the meal can increase the correlation between the user preferences and items
he dislikes so it is important to test the impact of every new feature before implementing it in the
recommendation system
54 Similarity Threshold Variation
Eq 46 previously presented in Section 433 and repeated here for convenience was used in the
first experiments to transform the similarity value returned by the Rocchiorsquos algorithm into a rating
value
Rating =
average rating + standard deviation if similarity gt= U
average rating if L lt= similarity lt U
average rating minus standard deviation if similarity lt L
The initial values for the thresholds 075 for U and 025 for L were good starting values to test
this method but now other cases need to be tested By fluctuating the case limits the objective of
this test is to study the impact in the recommendation and discover the similarity case thresholds
that return the lowest error values
Figures 52 and 53 illustrate the changes in MAE and RMSE values using the Epicurious and
Foodcom datasets respectively when adjusting the lower similarity threshold L in the range [0 025]
39
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 53 Lower similarity threshold variation test using Foodcom dataset
Figure 54 Upper similarity threshold variation test using Epicurious dataset
From this test it is clear that the lower threshold only has a negative effect in the recommendation
accuracy and subtracting the standard deviation does not help The accentuated drop in error value
seen in the graph from Fig 52 occurs when the lower case (average rating minus standard deviation)
is completely removed
As a result of these tests Eq 46 was updated to
40
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 55 Upper similarity threshold variation test using Foodcom dataset
Rating =
average rating + standard deviation if similarity gt= U
average rating if similarity lt U
(51)
Using Eq 51 it is time to test the upper similarity threshold Figures 54 and 55 present the test
results for theEpicurious and Foodcom datasets respectively For each similarity value represented
by the points in the graphs the MAE and RMSE were obtained using the same cross-validation tests
multiple times on the experimental recommendation component adjusting the upper similarity value
between each test
The results obtained were interesting As mentioned in Section 22 MAE computes the devia-
tion between predicted ratings and actual ratings RMSE is very similar to MAE but places more
emphasis on higher deviations These definitions help to understand the results of this test Both
datasets react in a very similar way when the threshold U is varied in the interval [0 075] The MAE
decreases and the RMSE increases When lowering the similarity threshold the recommendation
system predicts the correct rating value more times which results in a lower average error so the
MAE is lower But although it is predicting the exact rating value more times in the cases where it
misses the deviation between the predicted rating and the actual rating is higher and since RMSE
places more emphasis on higher deviations the RMSE values increase The best similarity threshold
is subjective some systems may benefit more from a higher rate of exact predictions while in others
a lower deviation between the predicted ratings and the actual ratings is more suitable In this test
41
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 56 Mapping of the userrsquos absolute error and standard deviation from the Epicurious dataset
the lowest results registered using the Epicurious dataset were 06544 MAE and 08601 RMSE
With the Foodcom dataset the lowest MAE registered was 03229 and the lowest RMSE 06230
Compared directly with the YoLP Content-based component which obtained the overall lowest
error rates from all the baselines the experimental recommendation component showed better re-
sults when using the Foodcom dataset
55 Standard Deviation Impact in Recommendation Error
When recommending items using predicted ratings the user standard deviation plays an important
role in the recommendation error Users with the standard deviation equal to zero ie users that
attributed the same rating to all their reviews should have the lowest impact in the recommendation
error The objective of this test is to discover how significant is the impact of this variable and if the
absolute error does not spike for users with higher standard deviations
In the Figures 56 and 57 each point represents a user the point on the graph is positioned
according to the userrsquos absolute error and standard deviation values The line in these two graphs
indicates the average value of the points in that proximity
Fig 56 represents the data from the Epicurious dataset The result for this dataset was expected
since it is normal for the absolute error to slowly increase for users with higher standard deviations
It would not be good if a spike in the absolute error was noted towards the higher values of the
standard deviation which would imply that the recommendation algorithm was having a very small
impact in the predicted ratings Having in consideration the small dimensionality of this dataset and
the lighter density of points in the graph towards the higher values of standard deviation probably
42
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 57 Mapping of the userrsquos absolute error and standard deviation from the Foodcom dataset
there was not enough data on users with high deviation for the absolute error to stagnate
Fig 57 presents good results since it shows that the absolute error of the users starts to stagnate
for users with standard deviations higher than 1 This implies that the algorithm is learning the userrsquos
preferences and returning good recommendations even to users with high standard deviations
56 Rocchiorsquos Learning Curve
The Rocchio algorithm bases the recommendations on the similarity between the userrsquos preferences
and the recipe features Since the userrsquos preferences in a real system are built over time the objec-
tive of this test is to simulate the continuous learning of the algorithm using the datasets studied in
this work and analyse if the recommendation error starts to converge after a determined amount of
reviews are made In order to perform this test first the datasets were analysed to find a group of
users with enough recipes rated to study the improvements in the recommendation The Epicurious
dataset contains 71 users that rated over 40 recipes This number of recipes rated was the highest
chosen threshold for this dataset in order to maintain a considerable amount of users to average the
recommendation errors from see Fig 58 In Foodcom 1571 users were found that rated over 100
recipes and since the results of this experiment showed a consistent drop in the errors measured
as seen in Fig 59 another test was made using the 269 users that rated over 500 recipes as seen
in Fig 510
The training set represents the recipes that are used to build the usersrsquo prototype vectors so for
each round an additional review is added to the training set and removed from the validation set
in order to simulate the learning process In Fig 58 it is possible to see a steady decrease in
43
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 58 Learning Curve using the Epicurious dataset up to 40 rated recipes
Figure 59 Learning Curve using the Foodcom dataset up to 100 rated recipes
error and after 25 recipes rated the error fluctuates around the same values Although it would be
interesting to perform this experiment with a higher number of rated recipes too see the progress
of the recommendation error due to the small dimension of the dataset there are not enough users
with a higher number of rated recipes to perform the test The Figures 59 and 510 show a constant
improvement in the recommendation although there is not a clear number of rated recipes that marks
44
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Figure 510 Learning Curve using the Foodcom dataset up to 500 rated recipes
a threshold where the recommendation error stagnates
45
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
46
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Chapter 6
Conclusions
In this MSc dissertation the applicability of content-based methods in personalized food recom-
mendation was explored Using the well-known Rocchio algorithm several approaches were tested
to further explore the breaking of recipes down into ingredients presented in [22] and use more
variables related to personalized food recommendation
Recipes were represented as vectors of features weights determined by their Inverse Recipe
Frequency (Eq 44) The Rocchio algorithm was never explored in food recommendation so vari-
ous approaches were tested to build the usersrsquo prototype vector and transform the similarity value
returned by the algorithm into a rating value needed to compute the performance of the recommen-
dation system When building the prototype vectors the approach that returned the best results
used a fixed threshold to differentiate positive and negative observations The combination of both
user and item average ratings and standard deviations demonstrated the best results to transform
the similarity value into a rating value These approaches combined returned the best performance
values of the experimental recommendation component
After determining the best approach to adapt the Rocchio algorithm to food recommendations
the similarity threshold test was performed to adjust the algorithm and seek improvements in the
recommendation results The final results of the experimental component showed improvements in
the recommendation performance when using the Foodcom dataset With the Epicurious dataset
some baselines like the content-based method implemented in YoLP registered lower error values
Being two datasets with very different characteristics not improving the baseline results in both
was not completely unexpected In the Epicurious dataset the recipe ingredient information only
contained its main ingredients which were chosen by the user in the moment of the review opposed
to the full ingredient information that recipes have in the Foodcom dataset This removes a lot of
detail both in the recipes and in the prototype vectors and adding the major difference in the dataset
sizes these could be some of the reasons why the difference in performance was observed
The datasets used in this work were the only ones found that better suited the objective of the
47
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
experiments ie that contained user reviews allowing to validate the studied approaches Since
there are very few studies related to food recommendations the features that better describe the
recipes are still undefined The feature study performed in this work which explored all the features
available in both datasets ingredients cuisines and dietaries shows that the use of all features
combined outperforms every feature individually or other pairwise combinations
61 Future Work
Implementing a content-boosted collaborative filtering system using the content-based method ex-
plored in this work would be an interesting experiment for a future work As mentioned in Section
32 by implementing this hybrid approach a performance increase of 92 as measured with the
MAE metric was obtained when compared to a pure content-based method [20] This experiment
would determine if a similar decrease in the MAE could be achieved by implementing this hybrid
approach in the food recommendation domain
The experimental component can be configured to include more variables in the recommendation
process for example season of the year (ie winterfall or summerspring) time of the day (ie
lunch or dinner) total meal cost total calories amongst others The study of the impact that these
features have on the recommendation is also another interesting point to approach in the future
when datasets with more information are available
Instead of representing users as classes in Rocchio a set of class vectors created for each
user could represent their preferences From the user rated recipes each class would contain the
features weights related with a specific rating value observation When recommending a recipe its
feature vector is compared with the userrsquos set of vectors so according to the userrsquos preferences the
vector with the highest similarity represents the class where the recipe fits the best
Using this method removes the need to transform the similarity measure into a rating since the
class with the highest similarity to the targeted recipe would automatically attribute it a predicted
rating
48
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
Bibliography
[1] U Hanani B Shapira and P Shoval Information filtering Overview of issues research and
systems User Modeling and User-Adapted Interaction 11203ndash259 2001 ISSN 09241868
doi 101023A1011196000674
[2] D Jannach M Zanker A Felfernig and G Friedrich Recommender Systems An Introduction
volume 40 Cambridge University Press 2010 ISBN 9780521493369
[3] M Ueda M Takahata and S Nakajima Userrsquos food preference extraction for personalized
cooking recipe recommendation In CEUR Workshop Proceedings volume 781 pages 98ndash
105 2011
[4] D Jannach M Zanker M Ge and M Groning Recommender Systems in Computer
Science and Information Systems - A Landscape of Research In E-Commerce and Web
Technologies pages 76ndash87 Springer Berlin Heidelberg 2012 ISBN 978-3-642-32272-3
doi 101007978-3-642-32273-0 7 URL httplinkspringercomchapter101007
978-3-642-32273-0_7
[5] P Lops M D Gemmis and G Semeraro Recommender Systems Handbook Springer US
Boston MA 2011 ISBN 978-0-387-85819-7 doi 101007978-0-387-85820-3 URL http
linkspringercom101007978-0-387-85820-3
[6] G Salton Automatic text processing volume 14 Addison-Wesley 1989 ISBN 0-201-12227-8
URL httpwwwcshujiacil~dbilecturesirIntroduction-IRpdf
[7] M J Pazzani and D Billsus Content-Based Recommendation Systems The Adaptive Web
4321325ndash341 2007 ISSN 01635840 doi 101007978-3-540-72079-9 URL httplink
springercom101007978-3-540-72079-9
[8] K Crammer O Dekel J Keshet S Shalev-Shwartz and Y Singer Online Passive-Aggressive
Algorithms The Journal of Machine Learning Research 7551ndash585 2006 URL httpdl
acmorgcitationcfmid=1248566
49
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
[9] A McCallum and K Nigam A Comparison of Event Models for Naive Bayes Text Classification
In AAAIICML-98 Workshop on Learning for Text Categorization pages 41ndash48 1998 ISBN
0897915240 doi 1011461529 URL httpciteseerxistpsueduviewdocdownload
doi=1011659324amprep=rep1amptype=pdf
[10] Y Yang and J O Pedersen A Comparative Study on Feature Selection in Text Cat-
egorization In Proceedings of the Fourteenth International Conference on Machine
Learning pages 412ndash420 1997 ISBN 1558604863 doi 101093bioinformatics
bth267 URL httpciteseerxistpsueduviewdocdownloaddoi=1011
329956amprep=rep1amptype=pdf$delimiter026E30F$npapers2publicationuuid
23DB36B5-2348-44C4-B831-DBDD6EC7702D
[11] J S Breese D Heckerman and C Kadie Empirical analysis of predictive algorithms for
collaborative filtering In Proceedings of the 14th Conference on Uncertainty in Artificial In-
telligence pages 43ndash52 1998 ISBN 155860555X doi 101111j1553-2712201101172
x URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitleEmpirical+
Analysis+of+Predictive+Algorithms+for+Collaborative+Filtering0
[12] G Adomavicius and A Tuzhilin Toward the Next Generation of Recommender Systems A
Survey of the State-of-the-Art and Possible Extensions IEEE Transactions on Knowledge and
Data Engineering 17(6)734ndash749 2005
[13] N Lshii and J Delgado Memory-Based Weighted-Majority Prediction for Recommender
Systems In ACM SIGIR rsquo99 Workshop on Recommender Systems Algorithms and Eval-
uation 1999 URL httpscholargooglecomscholarhl=enampbtnG=Searchampq=intitle
Memory-Based+Weighted-Majority+Prediction+for+Recommender+Systems2
[14] A Nakamura and N Abe Collaborative Filtering using Weighted Majority Prediction Algorithms
In Proceedings of the Fifteenth International Conference on Machine Learning pages 395ndash403
1998
[15] B Sarwar G Karypis J Konstan and J Riedl Item-based collaborative filtering recom-
mendation algorithms In Proceedings of the 10th International Conference on World Wide
Web pages 285ndash295 2001 ISBN 1581133480 doi 101145371920372071 URL http
dlacmorgcitationcfmid=372071
[16] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[17] A M Rashid I Albert D Cosley S K Lam S M Mcnee J A Konstan and J Riedl Getting
to Know You Learning New User Preferences in Recommender Systems In Proceedings
50
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51
of the 7th International Intelligent User Interfaces Conference pages 127ndash134 2002 ISBN
1581134592 doi 101145502716502737
[18] K Yu A Schwaighofer V Tresp X Xu and H P Kriegel Probabilistic Memory-Based Collab-
orative Filtering IEEE Transactions on Knowledge and Data Engineering 16(1)56ndash69 2004
ISSN 10414347 doi 101109TKDE20041264822
[19] R Burke Hybrid recommender systems Survey and experiments User Modeling and User-
Adapted Interaction 12(4)331ndash370 2012 ISSN 09241868 doi 101109DICTAP2012
6215409
[20] P Melville R J Mooney and R Nagarajan Content-boosted collaborative filtering for im-
proved recommendations In Proceedings of the Eighteenth National Conference on Artificial
Intelligence pages 187ndash192 2002 ISBN 0262511290 doi 1011164936
[21] M Ueda S Asanuma Y Miyawaki and S Nakajima Recipe Recommendation Method by
Considering the User rsquo s Preference and Ingredient Quantity of Target Recipe In Proceedings
of the International MultiConference of Engineers and Computer Scientists pages 519ndash523
2014 ISBN 9789881925251
[22] J Freyne and S Berkovsky Recommending food Reasoning on recipes and ingredi-
ents In Proceedings of the 18th International Conference on User Modeling Adaptation
and Personalization volume 6075 LNCS pages 381ndash386 2010 ISBN 3642134696 doi
101007978-3-642-13470-8 36
[23] D Billsus and M J Pazzani User modeling for adaptive news access User Modelling
and User-Adapted Interaction 10(2-3)147ndash180 2000 ISSN 09241868 doi 101023A
1026501525781
[24] M Deshpande and G Karypis Item Based Top-N Recommendation Algorithms ACM Trans-
actions on Information Systems 22(1)143ndash177 2004 ISSN 10468188 doi 101145963770
963776
[25] C-j Lin T-t Kuo and S-d Lin A Content-Based Matrix Factorization Model for Recipe Rec-
ommendation volume 8444 2014
[26] R Kohavi A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Se-
lection International Joint Conference on Artificial Intelligence 14(12)1137ndash1143 1995 ISSN
10450823 doi 101067mod2000109031
51