REGULAR PAPER
A recommender system for the TV on the web: integratingunrated reviews and movie ratings
Filipa Peleja • Pedro Dias • Flavio Martins •
Joao Magalhaes
� Springer-Verlag Berlin Heidelberg 2013
Abstract The activity of Social-TV viewers has grown
considerably in the last few years—viewers are no longer
passive elements. The Web has socially empowered the
viewers in many new different ways, for example, viewers
can now rate TV programs, comment them, and suggest
TV shows to friends through Web sites. Some innovations
have been exploring these new activities of viewers but we
are still far from realizing the full potential of this new
setting. For example, social interactions on the Web, such
as comments and ratings in online forums, create valuable
feedback about the targeted TV entertainment shows. In
this paper, we address this last setting: a media recom-
mendation algorithm that suggests recommendations based
on users’ ratings and unrated comments. In contrast to
similar approaches that are only ratings-based, we propose
the inclusion of sentiment knowledge in recommendations.
This approach computes new media recommendations by
merging media ratings and comments written by users
about specific entertainment shows. This contrasts with
existing recommendation methods that explore ratings and
metadata but do not analyze what users have to say about
particular media programs. In this paper, we argue that text
comments are excellent indicators of user satisfaction.
Sentiment analysis algorithms offer an analysis of the
users’ preferences in which the comments may not be
associated with an explicit rating. Thus, this analysis will
also have an impact on the popularity of a given media
show. Thus, the recommendation algorithm—based on
matrix factorization by Singular Value Decomposition—
will consider both explicit ratings and the output of senti-
ment analysis algorithms to compute new recommenda-
tions. The implemented recommendation framework can
be integrated on a Web TV system where users can view
and comment entertainment media from a video-on-
demand service. The recommendation framework was
evaluated on two datasets from IMDb with 53,112 reviews
(50 % unrated) and Amazon entertainment media with
698,210 reviews (26 % unrated). Recommendation results
with ratings and the inferred preferences—based on the
sentiment analysis algorithms—exhibited an improvement
over the ratings only based recommendations. This result
illustrates the potential of sentiment analysis of user com-
ments in recommendation systems.
Keywords Social-TV � Recommendation � Reviews
analysis � Sentiment analysis � Opinion mining
1 Introduction
The times when one would sit in front of the TV and
passively watch a fixed list of TV broadcast shows is
giving space to more interactive services such as video-on-
demand. The increasing competition from computers and
mobile devices as a novel way of accessing entertainment
at multiple places have pushed the TV viewing experience
into a more personalized and interactive experience, TV
F. Peleja � P. Dias � F. Martins � J. Magalhaes (&)
Departamento de Informatica, Centro de Informatica
e Tecnologias da Informacao, Faculdade de Ciencias
e Tecnologia, Universidade Nova de Lisboa,
2829-516 Caparica, Portugal
e-mail: [email protected]
F. Peleja
e-mail: [email protected]
P. Dias
e-mail: [email protected]
F. Martins
e-mail: [email protected]
123
Multimedia Systems
DOI 10.1007/s00530-013-0310-8
entertainment has grown well beyond the traditional
broadcast paradigm. The difference between the TV and
the Web is fading away with new Internet-enable TV
devices and new Web applications delivering media
entertainment from both amateurs and professional media
producers.
Social-TV is a novel paradigm that has received much
attention in the last decade: research in this area has
brought us new technologies to support interaction among
users. The change is even greater than a first look might
indicate: TV shows go far beyond the TV set and extend
themselves onto the Web with extra media entertainment,
online forums, discussions, collectors’ art, etc. People not
only expect rich interactive experiences, they are eager for
more social and engaging entertainment offered by active
services that ‘‘sense’’ their mood and preferences.
In this paper, we argue that social interaction is not
limited to interactions occurring during the show. People
interact after the show in online forums and other social-
media Web sites, they leave valuable feedback about the
show they have watched. For instance, in a recent survey
performed by Nielsen, a multinational survey of device
owers (http://blog.nielsen.com/nielsenwire/?p=31338) has
been observed that within the US and UK, 88 and 80 % of
the people that performed the survey claim to use other
device while watching TV at least once a month.
Thus, the true strength of Social-TV lies on the collab-
orative interaction and on the correct interpretation of
users’ most valuable feedback. Our proposal is to improve
media recommendations (available through a video-on-
demand (VoD) service) by exploring social interactions
among users. We propose a novel media entertainment
system that computes recommendations based on user
ratings and comments.
Products recommendations have become increasingly
popular in e-commerce services such as in Amazon Instant
Video service1 and Netflix service2 [25]. The main goal of
a recommender system is to suggest unknown items
(movies or other entertainment media) by considering the
information exchanged by users when interacting with the
system. Until recently, users commonly asked for a rec-
ommendation from their own circle of known friends or
family. However, recommendations demand a certain level
of trustworthy knowledge and not everyone is eligible to
provide a skilled recommendation. Thus, a recommender
system that observes the viewpoints of diverse users and
movies may offer a more reliable and insightful recom-
mendation than average user which is limited to the movies
own awareness in a vast number of different movies.
In general, two families of algorithms inspire Recom-
mender Systems (RS): content-based and collaborative
filtering. Broadly, to predict movies that might be of
interest to particular users, the content-based approach uses
a correlation analysis between users’ personal information
and movies metadata, while the collaborative-filtering
approach detects user-movie rating patterns over time.
The main difference between these two strategies relies
on the nature of the information used to build the recom-
mender system. In content-based approaches, information
related to users and movies are obtained manually and
therefore are highly expensive. In contrast, collaborative-
filtering approaches automatically identify future prefer-
ences by observing users’ interaction, i.e., user-movie
ratings and users’ comments/reviews.
Figure 1 depicts our approach: users rate media enter-
tainment shows and these ratings are received instanta-
neously by the recommender system; users can later
comment the show in the online forum or fan zone; user
preferences are then inferred from the text comments and
merged into the recommendation algorithm. Since inferred
ratings are the exclusive result of a text sentiment analysis
algorithm, this approach can be seen as a weakly super-
vised recommendation algorithm.
The popularity of the information exchanged by media
consumers (and not only TV viewers) has been increasing
at an enormous rate. While some Web applications allow
users to rate or comment a movie, others only allow one of
the possibilities. For example, blogs and online forums
only support comments, and personal media players only
support ratings. Some authors, such as Takama and Muto
[36], have explored sentiment analysis techniques to infer
TV viewers profiles from comments. In contrast, we bypass
the extraction of user profiles and directly compute new
recommendations. Moreover, we describe a complete
framework that fuses sentiment analysis output with users’
explicit ratings to recommend new media. This integrated
approach grants that no information is lost in the process of
creating a user profile. A comprehensive evaluation of our
framework illustrates how such integrated approach can
Sentiment analysis
Recommendationalgorithm
Social-TV
Online mediaforum
Review
Ratings
Recommendations
TV showsInferred
preferences
Fig. 1 Recommendations based on ratings and reviews
1 http://www.amazon.com.2 http://www.netflix.com.
F. Peleja et al.
123
indeed compute more accurate recommendations. We also
identify the right settings to obtain the best results, i.e.,
compensating for biased or spam comments.
This paper is organized as follows. Section 2 discusses
some of the most relevant previous work. Section 3 offers
an overview of the proposed recommendation framework
and Sects. 4 and 5 discuss the details of the framework.
More specifically, the implemented sentiment analysis
algorithm and the recommendation algorithm, respectively.
Finally, experimental evaluation and discussion on a real
dataset is presented in Sect. 6.
2 Related work
2.1 Social-TV
According to Jenkins [20] and Haythornthwaite [16] media
popularity is linked to social interactions in the new media.
They concluded that social ties and social media are
important to users’ media viewing habits. More recently,
Harboe et al. [15] conducted an experiment examining the
influence of social-TV watching. For example, a user
watching television alone but interacting through instant
notifications with friends/family that are watching the
same, or other, program. The Web is making TV enter-
tainment a social activity through online forums, instant
chatting, and other forms of technology-driven social
interaction.
Brown and Barkhuus [7] studied television-watching
practices among users of interactive media centers. They
have observed that iTV users are willing to signal some
minor preferences to receive personalized content. Uchyi-
git and Clark [41] proposed a similar system for the per-
sonalized generation of Electronic Program Guides (EPGs)
for digital TV. These approaches are a major step forward
in improving the viewer experience in TV, but their limited
user feedback and engagement has been a bottleneck in
these early systems. Thus, the amount of work in Web
recommendation systems [2, 25] shows the potential this
technology has for the TV domain. For example, Vildjio-
unaite and Kyllonen [42] addressed the issue of profiling
the preferences of a household. This implies modeling the
interaction of each individual (each child or each parent)
and the interaction of groups (just children, just parents, or
children with a parent).
To strengthen the adoption of program recommenda-
tions in iTV, the user must be deeply engaged. For
example, some approaches have adapted the live shows on-
the-fly according to users’ explicit votes/audiences col-
lected from set-top-boxes [46]. This example shows how
user feedback contributes to on-the-fly production of TV
shows through personalization for the masses. Thus,
exploiting social interactions is the key to modeling viewer
preferences and make TV entertainment more compelling.
Besides instant messages, set-top-boxes interaction data,
viewing habits, and online forum comments, other authors
have explored emotions [30]. Oliveira et al. [30] explored an
emotion-based approach to categorize, access, explore, and
visualize movies. In their system, named iFelt, users catego-
rize their movies according to the emotions they felt while
watching it. In contrast, our approach does not explicitly ask
information to users, neither computes a user profile; instead,
we pervasively monitor and analyze user actions, and directly
embed this information in the recommendation algorithm.
2.2 Recommender systems
Delivering recommendation services on the TV domain
can be a non-trivial task since the distribution architecture
of the TV content does not favor interactivity. Xu et al. [43]
discuss a general system architecture and its building
blocks for delivering recommendation services to the TV
domain. Their setup integrates DVB-T and DVB-S televi-
sion systems with a WebTV service to provide interaction
mechanisms enabling TV programs recommendations. The
Web part of their system overcomes the broadcast nature of
TV systems and allows user feedback and the delivery of
recommended TV programs. The MPEG-7 multimedia
description standard supports the description of a user TV-
usage history. Ferman et al. [14] propose a fuzzy algorithm
to compute the usage history descriptor. A filtering agent
then combines user preferences, usage history, and content
metadata to compute recommendations.
Recommender systems for the TV domain face several
specific difficulties, and in some cases aggravated diffi-
culties. Baudisch [4] considered collaborative-filtering
approaches and the cold-start problem in the TV domain.
The user burden and tolerance in interacting with such
systems can become a serious disincentive. TV viewers are
accustomed to zero-effort, thus Bausdisch suggest using
opinion leaders. TV viewing is rarely an individual expe-
rience, and most of the times the TV watching is a shared
experience (traditionally the family members). Thus, rec-
ommendations for groups of users are an important aspect
for the TV domain (even if users are not physically toge-
ther as in the Social-TV paradigm).
In this paper, we argue that in Social-TV users not only
rate TV programs but also comment and discuss the movie,
show, etc. Recommender systems can rely on user ratings
as in traditional collaborative-filtering or it can also explore
other data generated by users. Many researchers have
developed different strategies for exploring user feedback
in recommender systems (RS). Most of these approaches
gather movie ratings (explicitly provided by users) and
exploit this data as a collaborative-filtering task [25].
A recommender system for the TV on the web
123
Within collaborative-filtering approaches [23–25], the
matrix factorization methods have proven its superiority in
relation to other methods (e.g. nearest neighbor). Koren’s
work [23] also showed that ratings exhibit temporal pat-
terns linked to seasonal purchases (e.g. Christmas) and
other time-dependent events. In all cases, matrix factor-
ization approaches have been proven capable of modeling
the several details of the problem data. In this paper, we
also follow a matrix factorization approach.
Explicit ratings by itself can prove to be a limited metric
for assessing user opinion about a movie. In some cases,
such information can prove to be very scarce, especially
when the movie is of low quality, users simply do not
bother to rate the movie. In contrast, users may discuss, or
exchange impressions, about the movie. However, as Jakob
et al. [19] point out, most of recommendation algorithms
focus on the explicit ratings and user/products character-
istics disregarding the information enclosed in the free-text
reviews. In addition, to the best of our knowledge, only a
few studies have proposed to integrate sentiment analysis
with recommendation algorithms [1, 19, 26, 28, 47].
Leung et al. [26] suggested inferring ratings from
reviews and integrate them with a collaborative-filtering
(CF) approach. The authors tackle the extraction of mul-
tilevel ratings by proposing a new method of identifying
opinion words, semantic orientation and corresponding
strength. This method allows different semantic orientation
values to similar words. For example, the words terrible
and frightening may seem similar but in some domains
(e.g. movie) frightening is likely to be applied in a positive
context. However, in contrast to the present work, Leung
et al. [26] did not perform any evaluation of the recom-
mendation part and having only assessed the opinion words
sentimental strength and orientation.
In the movies domain, Jakob et al. [19] present the
advantages of improving recommendations with the senti-
ment extracted from user reviews. According to the
authors, the sentiment words should be split into clusters
where each cluster corresponds to different movie aspects.
Hence, the overall sentiment regarding a movie is mea-
sured by observing the sentiment words within these
clusters. In Jakob’s et al. approach, the sentiment review
information is supplied to the recommendation algorithm
as feature vectors. In comparison to Jakob et al. [19], where
recommendation always need explicit ratings, we infer
ratings and their associated confidence value from reviews.
Hence, unlike Jakob’s et al. proposal, the sentiment anal-
ysis inferred ratings are not directly combined with the
explicit ratings. In additional, Jakob’s et al. use a manual
and automatic clustering algorithm to infer movie aspects
upon which users express some opinion. In contrast, we do
not require a complete set of ratings for all reviews or fine-
grain aspects about the reviewed movies.
In a more recent study, Zhang et al. [47] proposes a
comprehensive approach to sentiment-based recommen-
dation algorithms on an online video service. Their sys-
tem extracts a ‘‘like’’/‘‘don’t like’’ information from
reviews, users’ facial expressions and relate the comments
to the video description through a keyword vector space
model. In Zhang’s et al. approach, the inferred prediction
is based on a unsupervised sentiment classification. In
addition, regarding the CF recommender system, our
approach differs on the inferred ratings incorporated
technique, i.e., Zhangs’ et al. build a list of keywords that
are combined in a users matrix, a products matrix, and a
ratings matrix.
To handle the sparsity of ratings, Moshfeghi et al. [28]
propose to improve a RS algorithm by considering not only
ratings but also emotion and semantic spaces to better
describe the movies’ and users’ space. The Latent Dirichlet
Allocation is used to compute a set of latent groups of
users. Moshfeghi et al. evaluation showed that such hybrid
approach (combining ratings with additional spaces
extracted from metadata) outperforms ratings-only
approaches and reduces the effects of cold-start.
In a different study, Aciar et al. [1] propose to analyze
the users’ reviews by developing an ontology to translate
the reviews text. Aciar et al. [1] ontology relies on
observing the review positiveness, negativeness, and the
users’ skill level. However, an important part of their work
relies on a manually created ontology capturing related
words. In addition, the training examples are manually
collected and labeled. Nonetheless, this study presents an
initial approach where the recommender system is based on
the reviews. The related-word concepts allow the identifi-
cation of co-related product characteristics (features). For
instance, in this domain the concept ‘‘carry’’ is related to
the concept of ‘‘size’’. Thus, Aciar et al. ontology measures
the quality of the several features within a product to create
user recommendations. Unlike the approaches to recom-
mendation systems from Aciar et al. [1], Jakob et al. [19],
and Leung et al. [26], we do not have any manual lexicons
or initialization and our focus is on the integration of rat-
ings and unrated comments.
2.3 Sentiment analysis (SA)
Sensing the mood and the preferences of users through text
analysis techniques is a research area that has been quite
active in the last decade. From the first techniques of
review analysis [31, 38], to more recent techniques of
tweets analysis [6, 10], the field has progressed much.
Nonetheless, the sentiment classification is commonly
tackled with binary classifiers in which specific character-
istics of different type of products, or rating scales more
closely related to the domain, suggest a multiclass
F. Peleja et al.
123
classification other than the simplistic view of positive
versus negative [35].
In a sentiment analysis study, one of the most important
tasks relies on identifying which words express a senti-
ment. Similarly to a text categorization task in which not
every word is related to a topic, not every word is qualified
with sentiment intensity. In this context, a word associated
with a sentiment intensity is also referred to as opinion
word. In [40], Turney and Littman reported a study which
shows that using only adjectives the sentiment classifica-
tion is improved. Nonetheless, other studies [12, 17, 37, 39]
reveal that adverbs, nouns, and verbs are also qualified with
sentiment intensity. Hence, in our work, we will consider
these aforementioned word-families as opinion words.
Initial research in SA aimed at understanding ‘‘how and
which words’’ humans use to express their preferences
[27]. Turney [38] aimed at assessing the positiveness or
negativeness of an opinion word through a method called
Semantic Orientation. It assumes that the correlation
between a word and two reference words—‘‘excellent’’ and
‘‘poor’’—indicates the orientation of the sentence (positive
or negative). Turney measures the correlation with the
Point-wise Mutual Information-Information Retrieval
(PMI-IR) algorithm on the Web. Mullen and Collier [10]
observe that the choice of the words ‘‘excellent’’ and
‘‘poor’’ for the PMI-IR metric seems somewhat arbitrary.
However, further experiments led the authors to conclude
that those terms were the most appropriate. In this paper,
we conducted a evaluation of PMI-IR with different ref-
erence terms.
Opinion words can be identified through a manual,
corpus-based or dictionary-based approaches. Since man-
ual approaches are highly time consuming, it is common to
combine it with other automated methods [8, 44]. Typically
a corpus-based approach [11, 38] relies on identifying
co-occurrence patterns while dictionary-based approaches
[18, 22] uses a seed of opinion words and a dictionary. A
popular linguistic resource in sentiment analysis is the
SentiWordNet dictionary which provides an answer to the
‘‘how and which words people use to express prefer-
ences?’’ question. The lexical resource SentiWordNet,
introduced by Esuli and Sebastiani [13] and recently
revised by Baccianella et al. [3], is a lexicon created semi-
automatically by means of linguistic classifiers and human
annotation. In this context, a set of synonyms representing
the same concept is referred as synset. In SentiWordNet,
each synset is annotated with its degree of positivity,
negativity, and neutrality (the same synset can express
opposite feelings). Previous studies using the opinion lex-
icon SentiWordNet for sentiment classification have shown
promising results [9, 29]. Ohana and Tierney [29] applied
the Support Vector Machine (SVM) classifier and a clas-
sifier that summed all the positive and negative features in
a review. Their evaluation on an IMDB corpus [32] indi-
cate SentiWordNet as an important resource for sentiment
analysis tasks, although the best accuracy obtained for their
SVM classifier was 69.35 %. Denecke [9] evaluated a rule-
based classifier with the SentiWordNet on different
domains. Denecke showed that a rule-based classifier
(RIPPER) performed lower than a logistic regression
classifier. In the present study, we will use a gradient-
descent classifier and measure the opinion word degree of
positivity and negativity with the SentiWordNet dictionary.
Previous approaches have shown that movie reviews are
among the most difficult ones to analyze. The most evident
result was obtained by Turney [38] that reached an accu-
racy of 66 % for movie reviews compared to an accuracy
of 80 % for automobile reviews. Despite this fact, we will
show how, given a sufficiently large number of movie
reviews, we can improve movie recommendation
techniques.
3 Sentiment-based recommendation framework
The goal of the proposed framework is to integrate in
one single recommendation framework both explicit rat-
ings and free-text comments with no rating associated.
The algorithm behind the recommendation framework
analyses user comments and represents these together
with user ratings in a collaborative matrix integrating the
interactions of all users. Figure 2 illustrates the proposed
sentiment-based recommendation framework. The
framework is divided into two parts: a comments analysis
algorithm to infer ratings from user comments and a
recommendation algorithm that merges all data into a
single sparse and highly incomplete matrix, to compute
new recommendations by matrix factorization. The two
following sections will detail both algorithms of the
framework.
The laboratory demonstrator where the recommendation
framework is integrated offers a social-media online ser-
vice for sharing, commenting, rating, and interacting with
movies. The Web TV demonstrator home page, Fig. 3, lists
the most popular movies and popular actors. Once the user
asks for personalized recommendations, the system
examines the user interactions with the online service and
computes a playlist recommendation. The playlist recom-
mendation is shown in full screen with the recommended
videos at the top of the screen (Fig. 4). The user is allowed
to comment and rate all movies in the database. In Fig. 5,
the UI for the user interactions with the online service is
shown. The evaluation of this demonstrator is outside the
scope of this paper.
A recommender system for the TV on the web
123
4 Comments analysis algorithm
The goal of the comments analysis algorithm is to analyze
the feedback that users write about a movie and infer a
preference in the form of ratings. To formulate the
problem, we consider a set of text reviews and their asso-
ciated rating,
D ¼ fðre1; ra1Þ; . . .; ðren; ranÞg; ð1Þ
where a comment/review rei is rated according to the value
of rai 2 1; 2; 3; 4; 5f g: Reviews are represented by a set of
opinion words (OW), i.e.
rei ¼ ðowi;1; . . .; owi;mÞ; ð2Þ
where each component owi;j represents the opinion word j
of the review i. An opinion word (OW) is a word that
conveys a feeling or preference, e.g. ‘‘great’’ or
‘‘miserable’’. Moreover, for each OW, we identify the
semantic orientation (like or dislike) and quantify its
positiveness or negativeness. Therefore, the comments
analysis algorithm aims at learning a classification
function,
U reið Þ7! 0; 1½ �; ð3Þ
Unrated comments
Comments analysis
1. Corpus representation
2. Orientation and intensity of words
3. Comment classification
Ratings
User-movie recommendations
SemanticOrientation
SentiWordNet
Rated comments
Sentiment-based recommendation
1. User/movie biases
2. Matrix factorization
3. Recommendations inference
Fig. 2 Proposed framework
Fig. 3 Media player with
recommendations at the top
Fig. 4 Playing a video with recommendations at the top
F. Peleja et al.
123
to infer the rating of the review rei. Following a machine
learning approach, this function is learnt as a probabilistic
model p raijreið Þ estimated from a training set.
4.1 Corpus representation
The most elementary representation of an OW is the bag-
of-words representation, i.e., a unigram-based representa-
tion. Pang et al. [6] claims that this representation delivers
fairly good results in relation to a bigram, or adjective,
representation. However, its simplicity may raise some
doubts on its ability to describe a sentiment. For instance,
the unigram representation may fail to capture strong
opinions [7]. For this reason, each review rei ¼ðowi;1; . . .; owi;mÞ is represented as a histogram of unigrams
(isolated words) and frequent bigrams (adjective-word
pairs).
4.2 Orientation and intensity of opinion words
The orientation of an OW is related to a word’s affinity
towards a positive or negative sense. Recently, Turney [8]
proposed a metric to estimate the orientation of a phrase
using the concept of the point-wise mutual information
(PMI), which is known for its ability to measure the
strength of semantic associations. The metric will measure
the degree of statistical dependence between the candidate
word and two reference words (i.e. a positive reference
word and a negative reference word). A high co-occurrence
between the candidate word and the positive word indicates
a positive sense, e.g. high co-occurrence between ‘‘ice-
cream’’ and the reference word ‘‘excellent’’. Since the
choice of reference words is of particular importance, the
algorithm presented in this paper considers the reference
words presented in Table 1.
The semantic orientation (SO) is computed by observing
the co-occurrence between the reference words and the
candidate word on the Web corpus. Thus, the semantic
orientation is given by the expression
SOðwordÞ ¼ hitsðword; ‘‘excellent00Þhitsð‘‘poor00Þhitsðword; ‘‘poor’’ÞÞhitsð‘‘excellent’’ÞÞ ;
where hits(word) and hits(word, ‘‘excellent’’) are given by
the number of hits a search engine returns using these
keywords as search queries.
Computing the semantic orientation can be computa-
tionally very demanding. The high dimensionality gener-
ated by the use of the bigram representation does not allow
the process to scale due to typical search engines querying
constrains. Thus, the semantic orientation of the adjective-
Fig. 5 Media comments and ratings
Table 1 Semantic orientation and word polarity references
Technique Word polarity references
T: Turney [8] ‘‘excellent’’/‘‘poor’’
G: Generic ‘‘good’’/‘‘bad’’
DS: Domain specific ‘‘best movie’’/‘‘worst movie’’
DS ? T ‘‘excellent movie’’/‘‘poor movie’’
A recommender system for the TV on the web
123
word pairs is replaced by the semantic orientation of the
adjective.
The semantic orientation determines the polarity of a
word but does not weight the intensity expressed by the
opinion word. OWs may express different intensity values,
for instance: ‘contented’ versus ‘ecstatic’ [4]. Thus, with a
lexical resource SentiWordNet [13], we identify the OW
strength. In this lexical resource, each feature is associated
with two numerical scores (positivity and negativity). In
this context, given the SO for a unigram, or bigram, Sen-
tiWordNet will return its sentiment strength. So, the Sen-
tiWordNet (swn) value of an OW will be given by,
swnðowÞ ¼ posSWNðowÞ; SOðowÞ[ 1
negSWNðowÞ; SOðowÞ� 1
�ð5Þ
where posSWNðowÞ corresponds to the positive score value
given by SentiWordNet and negSWNðowÞ will correspond
to the negative score. For the adjective-word bigram
representation the score will be measured as
swnðadjective�wordÞ ¼ swnðadjectiveÞ þ swnðwordÞð6Þ
Table 2 illustrates how the sentence ‘‘Love it or hate it,
however someone tell me what on Earth…’’ is processed:
according to the word’s family, we either discard it or
determine its semantic orientation and SentiWordNet
weight. Expression (5) is then applied to the weight of
the words vector Eq. (2) accordingly.
5 Classification
For classifying reviews, we used Vowpal Wabbit frame-
work.3 VW uses a linear classifier to assign a confidence
value to a review. The classifier identifies the orientation
and intensity of all opinion words of a review rei ¼owi;1; . . .; owi;m
� �and computes its rating based on the
sigmoid function,
U reið Þ ¼1
1þ exp �P
j owi;j � wj
� � : ð7Þ
The weights wj are learned with an gradient-descent
algorithm to train the function to distinguish between rat-
ings 1 and 5. Other classifiers could have been used;
however, VW implements a stochastic gradient-descent
classifier for optimizing the square loss of a linear model.
In addition, VW operates in an entirely online fashion
overcoming practical limitations such as efficiency and
scalability [45].
6 Computing recommendations with ratings
and unrated comments
In this section, we describe the collaborative-filtering
algorithm that combines ratings with the comments anal-
ysis output. The algorithm decomposes the ratings matrix
into two new matrices representing movies and users. This
matrix factorization introduces a bias correction mecha-
nism to compensate for different users’ optimism and dif-
ferent movie popularities. A second step selects the
comments to be merged with the explicit ratings based on
the inference confidence and convert the probabilistic
analysis output of the unrated comments into a 1–5 star
scale.
6.1 Ratings-based recommendation
Among all recommending techniques, collaborative-filter-
ing approaches are the most widely adopted. Collaborative-
filtering techniques attempt to collaboratively infer users’
preferences towards products, by analyzing the user-movie
ratings matrix R. Each element of this matrix represents a
rating given by user u to movie i, expressed by a numeric
value. It is important to mention that since each user usu-
ally rates a very small portion of all available products, the
ratings matrix R is always sparsely filled. Thus, the purpose
of collaborative-filtering techniques is to work over the few
known ratings to predict the unknown ones.
Within collaborative-filtering techniques, latent factor
approaches are very popular [25]. A well-known alterna-
tive to latent factor approach are the neighborhood meth-
ods. However, neighborhood methods are limited to
like-minded users. Thus, latent factor approaches allow to
discover a wider range of recommendations overlooked by
neighborhood methods. The purpose of latent factor
approaches to recommender systems is to map both users
Table 2 SentiWordNet weights for the sentence ‘‘Love it or hate it,
however someone tell me what on earth…’’
Word Family SO posSWN negSWN
Love N -0.0824 1.375 0.0
It – – – –
Or – – – –
Hate V 0.8399 0.0 0.75
It – – – –
However R -0.3415 0.5 0.5
Someone N -0.6594 0.0 0.0
Tell V -0.3956 0.875 0.625
Me – – – –
What – – – –
On – – – –
Earth N -0.4041 0.0 0.625
3 https://github.com/JohnLangford/vowpal_wabbit/wiki.
F. Peleja et al.
123
and movies onto the same latent factor space, representing
these as vectors with k dimensions:
pu ¼ ðu1; u2; . . .; ukÞ qi ¼ ði1; i2; . . .; ikÞ; ð8Þ
where, pu is the user u factors vector, qi is the movie i
factors vector, and k is the number of latent factors
(dimensions) where each user u and each movie i are
represented. With this latent factor representation of users
and movies, we intend to achieve a rating prediction rule to
assess user preferences for movies, by calculating the dot
product of their respective latent factor vectors, as follows:
rui ¼ pu � qi; ð9Þ
where rui is the predicted rating of user u for movie i.
6.2 Matrix factorization through singular value
decomposition
As previously mentioned, the first step to obtain such
representation is to discover the latent factor space under-
lying the user-movie ratings matrix. The most widely
adopted category of techniques to discover this latent factor
space is matrix factorization, mainly through Singular
Value Decomposition (SVD). SVD is a technique to
decompose the users-products matrix R into a product of
three matrices—U, R and V. The matrix U contains the left
singular vectors, R contains the singular values, and V
contains the right singular vectors of the original matrix R.
Due to the vast amount of users and movies in most real
recommender systems, even if the ratings matrix R would
be full, it would be computationally expensive to deal with
a full SVD. Thus, what is pursued is a low-rank approxi-
mation to SVD. Such low-rank approximation can be
obtained by zeroing out the less relevant (lower) singular
values contained in matrix R; preserving only the k most
relevant ones. Notice that the number k will determine the
dimensionality of the pursued latent factor space.
The application of SVD to recommender systems is
motivated by the desire of decomposing the ratings matrix
into a 2-matrices representation R ¼ P � QT ; as
R ¼r1;1 . . . r1;n
..
. . .. ..
.
rm;1 � � � rm;n
264
375
¼u1;1 . . . u1;k
..
. . .. ..
.
um;1 � � � um;k
264
375:
p1;1 . . . p1;k
..
. . .. ..
.
pn;1 � � � pn;k
264
375
T
ð10Þ
Again, matrix R is the ratings matrix where each rui
value represents a rating given by user u to movie i,
expressed by a real value. Each vector (row) pu of P
represents a user u and each vector (row) qi of Q represents
a movie i, as in Eq. 9. Again, the goal of using matrix
factorization in recommendation problems is to enable the
assessment of user preferences for movies by calculating
the dot product of their factor vector representations, as
previously defined, by Eq. 10.
As previously mentioned, original SVD is designed to
be performed over a complete matrix, decomposing it into
a product of 3 matrices. Thus, the SVD technique must
undergo some modifications to deal with a sparsely filled
ratings matrix and decompose it into a 2-matrices repre-
sentation. In that sense, Simon Funk4 suggested an efficient
solution to learn the factorization model which has been
widely adopted by other researchers [25] and consists in
decomposing the ratings matrix into a product of a user-
factor matrix with a movie-factor matrix, by taking into
account the set of known ratings only. Thus, matrices P and
Q are given by:
½P;Q� ¼ arg minpu;qi
Xrui2R
rui � pu � qTi
� �2 þ k � puk k2þ qik k2� �
ð11Þ
This expression accomplishes two goals: matrix
factorization by minimization and the corresponding
regularization. The first part of Eq. 12 pursues the
minimization of the difference (henceforth referred to as
error) between the known ratings present on the original R
ratings matrix and their decomposed representation P and
Q. The second part controls generality by avoiding over-
fitting during the learning process, where k is a constant
defining the extent of regularization, usually chosen by
cross-validation.
6.3 Rating biases
Although the latent factor vectors inference widely cap-
tures rating tendencies, some improvements can be made to
the model by defining baseline predictors. This allows for
the factor vectors to simply swing the baseline predictions
towards the real rating values instead of having to fully
capture the rating patterns on their own. A straightforward
choice for a baseline predictor is the global average of the
observed ratings. In addition, it is useful to account for the
fact that some users tend to give higher ratings than others
and some movies tend to get higher ratings than others, as
well. Based on this premise, arrangements can be made to
capture these rating trends, regarded as user-related or
movie-related deviations from the average rating, hence-
forth referred to as user and movie biases. This reasoning
leads to a new model where, by considering the global
rating average and biases, the prediction rule can be
modified into
4 http://sifter.org/*simon/journal/20061211.html.
A recommender system for the TV on the web
123
rui ¼ pu � qi þ lþ bu þ bi: ð12Þ
On the new prediction rule, the parameters l, bu, and bi
represent the global rating average, the user bias, and the
movie bias, respectively. Accordingly, the new least-
squares problem intended to solve, which is an extension
of the regularized Eq. 12 becomes:
½P;Q� ¼ arg minpu;qi
Xrui ln R
ðrui � ruiÞ2 þ k � ð puk k2þ qik k2þb2u þ b2
i Þ
ð13Þ
7 Merging ratings and unrated comments
So far, the algorithm assumes the existence of a ratings
matrix R containing all user-movie ratings. This matrix is
by nature highly incomplete given the large number of
movies and users and the limited number of rated movies
per user. On average, a user may rate 30 movies from a set
of 2 million movies, and the remaining ratings are
unknown. Thus, the above ratings matrix can be made
more complete by adding ratings inferred by the sentiment
analysis of user comments. Figure 6 illustrates the process
described in this section.
Consider now a set G ¼ ra1; . . .; ranf g of explicit rat-
ings assigned by users, and a set F ¼ re1; . . .; remf g of
unrated comments written by users. Applying the sentiment
analysis function U �ð Þ; described in Eq. 7, to convert
comments into probabilistic ratings we define the C func-
tion to quantize this value into a rating value, i.e.,
rai ¼ C U reið Þð Þ: ð14Þ
This will put all ratings in the same format, rai 21; 2; 3; 4; 5f g; and the union of both data will create a new
set R ¼ G [ C U F ¼ re1; . . .; remf gð Þð Þ; which is easily
represented as a ratings matrix with both explicit ratings
and inferred ratings. The inclusion of the ratings inferred
by the algorithm does not take into account the nature of
the binary classifier. Thus, the final step is to filter-out the
less accurate inferred ratings.
Since the classifier discriminates between positive and
negative reviews, the inferred ratings with a probability
around 0.5 have a higher uncertainty associated to it. Fol-
lowing this reasoning, the C function filters-out the
unwanted ratings by imposing a threshold around the
probability 0.5 to ignore these ratings before translating
the classifier’s output into inferred ratings. Formally, the Cfunction is expressed as
rai ¼ C U reið Þð Þ
¼ ;; 0:5� th\U reið Þ\0:5þ thround 4 �U reið Þ þ 1ð Þ; otherwise
�
ð15Þ
where the threshold th is used to discard ratings whose
probabilities are close to 0.5; all other values are converted
into an integer rating. For example, with th = 0.2, ratings
in the interval 0:3\U �ð Þ\0:7 will be discarded. This
allows the recommendation algorithm to accept ratings for
which the confidence is greater. In the experimental
evaluation, we further assess the influence of the threshold.
8 Experimental evaluation
8.1 Datasets
To evaluate the implemented recommendation framework,
a large-scale dataset with both reviews and ratings for
multiple users and media was required. Such a large dataset
is only available on large production sites such as IMDb5,
Amazon, and other VoD services. Three datasets were
chosen to perform the evaluation: (1) the polarity dataset
[32] widely used in sentiment analysis containing 2.000
movie reviews from IMDb6, (2) 698.210 movie and music
reviews from Amazon [21], and finally, (3) the dataset used
by Jakob et al. [19] which contains 53,112 movie reviews
from IMDb for comparison purposes. More specifically, we
have:
• Polarity: This dataset is used to validate the sentiment
analysis algorithm: it is evenly split into positive and
negative categories. We used 1,400 training and 600
test reviews along with a number of positive and
negative reviews equally divided.
• Amazon movies and music: This large-scale dataset is
used to blend sentiment analysis knowledge in the full
recommender framework. This dataset7 was compiled
by [21] and includes reviews that are rated with 1–5
rating stars.
• IMDb-TSA09: This dataset was released by Jakob et al.
[19]. This data covers 2,731 movies and 509 users. The
reviews are rated with 1–10 rating stars. We have
chosen this dataset for comparison purposes.
Unlike polarity dataset, the Amazon and IMDb-TSA09
datasets do not offer an equally distributed number of
ratings across all scale (1–5, or positive vs. negative). Once
considered what inspires users to offer their’ insights about
a movie should be forseeable the lack of proportionality
between positive and negative reviews. When Amazon is
related to users purchases and intuitively, we may say that
the odds of a user acquiring a movie that is displeasing is
smaller than to be pleased with the purchased. While
5 http://www.imdb.com.6 http://www.cs.cornell.edu/people/pabo/movie-review-data.7 http://131.193.40.52/data/.
F. Peleja et al.
123
regarding the movie domain, this notion is not as intuitive.
Table 3 presents the datasets details. Concerning the
positive and negative ratings range, we have followed Pang
et al. [31], and others [5, 28, 34], reasoning, thus, we
trained and evaluated the sentiment analysis algorithm on
the Amazon dataset and considered reviews with 4 or 5
ratings as positive, otherwise negative. In addition, in the
IMDb-TSA09 dataset, we considered reviews with ratings
above 6 as positive, otherwise negative.
8.2 Evaluation metrics
The evaluation of the sentiment analysis algorithm is given
by the standard evaluation measures of precision, recall
and F-score, which is the harmonic mean between preci-
sion and recall,
Prec ¼PN
i¼1 TPiPNi¼1 ðTPi þ FPiÞ
Rec ¼PN
i¼1 TPiPNi¼1 ðTPi þ FNiÞ
ð16Þ
F�score ¼ 2Precision� Recall
Precisionþ Recallð17Þ
where TP, the true positives, corresponds to all correctly
classified reviews as belonging to the class; TN, true
negative, to all the correctly classified as not belonging to
the class; FP, false positive, to all misclassified as
belonging to the class; and, FN, false negative, to all
misclassified as belonging to other class.
To evaluate the RS framework, the statistical measure
root mean square error is applied,
RMSE ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPrui2R ðrui � ruiÞ2
Rj j
s; ð18Þ
where R represents the set of ratings, rui represents the
rating given by the user u to movie i, and rui represents the
rating predicted by the RS algorithm. Small values of
RMSE indicate a more accurate performance.
9 Experiment: SO reference words
Different reference words (see Table 1) were used to
compute the influence of the Semantic Orientation on the
sentiment analysis algorithm. Figure 7 quantifies this
effect. The standard Semantic Orientation, as proposed by
Turney [38] reached an F-score of 74 % (word references
are ‘‘poor’’/‘‘excellent’’). The Domain Specific (DS) word
references combined with Turneys proposal (T) improves
the F-score by 4 % (‘‘excellent movie’’/‘‘poor movie’’).
Thus, a careful selection of Domain Specific reference
words can lead to an improved classifier: context sensitive
words should be combined with strong negative or positive
words (‘‘excellent’’/‘‘poor’’).
10 Experiment: recommendation framework
To evaluate the influence of the comments analysis over
the RS algorithm, the datasets were randomly split into
four disjoint datasets. The role of each subset is detailed on
Table 4. These splits allow an unbiased evaluation with
new data for each training and test step.
10.1 Comments and ratings based recommendations
(Amazon dataset)
In order to integrate the inferred ratings in the recom-
mendation system, it is required to examine the SA ratings
confusion-matrix. The Amazon subset #1 was used to train
the algorithm and the subset #2 to test it.
Table 5 presents the ratings confusion-matrix between
the predicted ratings and the actual ratings.The values in
bold in the diagonal correspond to the correctly classified
reviews. If the comments analysis part were completely
accurate, only the diagonal would be active. The right-most
column and the bottom row present the total number of
ratings for a given level. For example, there are 95,248
ratings of 5-star and the algorithm predicted 39,658 ratings
of 5-star. We aim at distinguishing between similar ratings,
which is a more challenging task than a binary classifica-
tion task (positive vs. negative) [35]. In addition, the users’
Table 3 Detailed information the datasets
Dataset #Reviews #Users #Movies
Polarity [32] 2,000 – –
IMDb-TSA09 [19] 53,112 509 2,731
Amazon [21] 698,210 3,700 8,018
3 2
5 1 4
2 3
4 5
2 3
Original ratings
Computed by the sentiment analysis
algorithm
4
3 2
5 1 4
2 3 5
4 5
2 3
4
3 2
5 51 4
2 3 5
4 4 5
4
2 3
Review-based recommendation
Review-enhanced ratings
New recommendations
Use
rs
Media
Fig. 6 Merging ratings and
comments can improve
recommendation performance
A recommender system for the TV on the web
123
reasoning when providing a rating, and the associated
review, differs. Some users can prove to be more
demanding, or generous, than others. Nonetheless, we aim
at inferring a rating into a recommendation algorithm in
which ratings are re-adjusted through users’ and products
biases (bu and bi in Eq. 14).
Inspecting the confusion-matrix (Table 5), one can see
how the number of reviews incorrectly classified as
belonging to other ratings are distributed. A more conve-
nient visualization of the confusion-matrix is offered by
Fig. 8. It can be observed that incorrectly predicted ratings
are usually in the neighboring ratings. This is also justified
by the nature of the data since users may write a review
with a rating of 4 while others not as demanding write a
similar review with a rating of 5 [33]. In addition, one can
observe how the data is unbalanced in which most of the
ratings are 4 or 5, which implies a small number of low
ratings. This explains why the confusion is greater among
the low ratings.
Once the reviews are rated by the analysis algorithm, we
proceeded to the recommendation algorithm evaluation. In
this setting, we have three situations corresponding to the
three principal experiments:
1. RS Lower bound (LB): the system was trained on a
minimum set of ratings corresponding to subset #3 (see
Table 4). This establishes the lower bound on the
error.
2. RS Upper bound (UB): the system was trained on the
maximum number of ratings corresponding to the
union of the subsets #2 and #3 (see Table 4). This
establishes an upper bound on the error.
3. RS ? OM (OM): the system was then trained on
explicit ratings (subset #3) and ratings inferred from
unrated reviews (subset #2). In this experiment, all
ratings of subset #2 are withheld.
The summary presented on Fig. 9 provides a strong
message: comments analysis can indeed improve
recommendations with the ratings provided by the users by
examining the reviews that they wrote. The RMSE
obtained with OM brings into light a surprising result
concerning the replacement of the explicit ratings (Fig. 9)
by the inferred ratings. When using just explicit ratings for
the LB and UB, the RMSE was 1.0092 and 0.9963,
respectively. However, with the inferred ratings (OM), we
obtained a lower RMSE of 0.9845. Hence, we argue that
the inferred ratings (OM) can better accommodate the
uncertainty regarding the explicit rating assigned by users.
This is explained by the fact that some ratings are strongly
biased by users and a review provides a more complete
opinion. For example, users’ reviews that focus on
answering other reviews or unrelated information about the
movie (actors previous performances). Moreover, since the
inferred ratings are produced by a single algorithm, its bias
is unique, thus easier to be compensated by the recom-
mendation framework.
Figure 10 provides a detailed view of how the threshold
value influences the recommendations quality (RMSE).
The upper bound (UB) and lower bound (LB) correspond
to the RS classifications where the OM has no influence.
The RS ? OM curve includes the ratings from subset #3
and the ones inferred from subset #2 reviews (Table 4).
When OM inferred ratings are added to the set of LB rat-
ings, we see that the recommendation framework can
indeed improve the overall RMSE.
For a threshold th = 0.0, all inferred ratings are used by
the recommendation system; for a threshold th = 0.5, only
the inferred ratings with probabilities 0.0 and 1.0 are used.
As the threshold th increases, ratings with probabilities
near 0.5 are ignored (they are ambiguously positive or
negative). Thus, the higher the threshold, the fewer inferred
ratings are considered (# of OM ratings curve).
The RS ? OM curve illustrates how the analysis of
unrated comments can indeed improve the RMSE of the
computed recommendations. As we exclude ratings closer
to probability 0.0 and 1.0, the RMSE increases until it
reaches its worst value for th = 0.5. This corresponds to
considering 1-star and 5-star inferred ratings. We believe
that this RMSE value is due to the high amount of 5-stars
spam reviews and to the wrath of some users when writing
Fig. 7 F-score on the polarity dataset
Table 4 Data splits for the recommendation evaluation
Split #Reviews Description
Amazon IMDb-TSA09
#1 184,996 23,599 Train comments analysis
#2 182,651 23,601 Test comments analysis/Train RS
#3 236,450 #1 Train RS combined with #2
#4 94,113 5,912 Test RS
F. Peleja et al.
123
1-star reviews. To best examine this behavior of the
RS ? OM curve, Fig. 11 presents an insightful look into
the performance of the comments analysis algorithm.
Precision is quite high for 5-star ratings but it is extremely
low for the other rating levels—this is critical because the
recommendation algorithm needs both low and high rating
values. Recall is below 30 % for 1- and 2-star ratings and
above 30 % for 3-, 4- and 5-star ratings. These recall
values generate a small set of 1- and 2-star ratings. Note
that precision and recall measure the exact match between
the actual ratings and the inferred ratings. However, for the
recommendation algorithm what is most important is the
average error between the actual rating and the inferred
rating. In other words, we need to consider the mean
absolute error of each predicted rating.
Figure 11 illustrates the MAE curve (mean absolute
error) between the predicted ratings and the true ratings.
One can see that for 3- and 4-star ratings, the average
distance between the predicted and the true rating is less
than 1. Thus, this graph shows that noisier data is con-
centrated on ratings with 1-star and 5-stars, which clarifies
the RS ? OM curve behavior.
10.2 Comments and ratings based recommendations
(IMDb-TSA09)
In this section, we compare our proposal to Jakob et al. [19]
approach. While their approach can explore more media
related information such as genres and actors, it does not
consider unrated comments. The dataset used in this sec-
tion (IMDb-TSA09) corresponds to the dataset used in
[19]. In this experiment, we trained the sentiment analysis
algorithm with the split #1 and inferred the sentiment
analysis ratings (OM) on split #2 (see Table 4). For the
recommendation algorithm, the split #1 was used to train
individually, and combined with the inferred ratings from
split #2. The split #4 was used to evaluate the recom-
mendation algorithm. In the first experiment (Fig. 12), our
baseline performed better (RMSE = 1.819) than Jakob’s
et al. (RMSE = 1.853) when considering the full set of
ratings. In fact, our approach was slightly better than
Jakob’s et al. when their approach included sentiment
analysis (RMSE = 1.823) and we only had explicit ratings
(RMSE 1.819). Thus, one would expect that by including
unrated reviews, our approach would increase this differ-
ence since Jakob’s et al. approach is not designed to handle
unrated reviews.
A second experiment (Fig. 13) was conducted on this
dataset to confirm the influence of the sentiment analysis
on the recommendation framework performance. We used
50 % of explicit ratings and 50 % of inferred ratings to
achieve RMSE = 1.886 which is slightly better than just
using 50 % of explicit ratings RMSE = 1.896. Since this
dataset has a small set of reviews for training, we believe
that a finer-grain classifier or more training data can further
increase this gap. These experiments show how the pro-
posed approach compares to existing ones: despite being
competitive, it can also extract extra information from the
text reviews to infer unknown ratings, which makes it
applicable to a wider range of scenarios.
Table 5 Ratings confusion-matrix
Predicted rating values
1 2 3 4 5
True rating values
1 1,723 1,497 2,205 994 32 6,451
2 2,000 2,086 2,946 1,468 98 8,598
3 2,931 3,806 7,494 6,178 1,173 21,582
4 2,369 4,253 13,410 21,848 8,892 50,772
5 1,695 3,892 16,447 43,751 29,463 95,248
10,718 15,534 42,502 74,239 39,658
Fig. 8 Normalized predicted ratings distribution
Fig. 9 RMSE for LB, UB and OM blend with RS
A recommender system for the TV on the web
123
11 Conclusion
In this paper, we proposed a recommendation framework
where ratings and unrated comments from media con-
sumers are blended to improve recommendations accuracy.
This is an ideal application for a cable operator wishing to
implement a system that considers users complains and
praises as measures of user satisfaction. The evaluation
with real user data illustrates the importance of revising
users’ explicit ratings with text analysis techniques.
The described evaluation shows that by applying senti-
ment analysis techniques to the unrated user reviews, we
can compute more accurate recommendations than by just
using explicit ratings. Following the presented evaluation,
we point out the following observations:
• The proposed recommendation framework can suc-
cessfuly integrate unrated reviews with ratings to
improve ratings-only recommendations. This has a
direct applicability to Social-TV environments where
users rate some of their consumed media or discuss the
media in online forums without rating it.
• The recommendation framework can be applied to filter
spam reviews or to add a review-based bias. Since all
reviews are analyzed by a common algorithm, consis-
tency is guaranteed across all inferred ratings.
Fig. 10 RMSE of the
recommendations versus the
opinion mining output. As the
threshold increases, less UðreiÞratings are included
Fig. 11 Sentiment analysis precision and recall per rating. The MAE
measure indicates the average distance to the true rating
Fig. 12 RMSE when only ratings are used
Fig. 13 RMSE improvement when sentiment analysis is added
F. Peleja et al.
123
• Reviews with extreme ratings (1-star or 5-star) will
harm the recommender system performance. We
observed that ratings with very high confidence are
usually the source of undesired biases. A carefull
processing of the inferred ratings is required.
As far as we are aware, only a few authors have tackled
a similar problem [1, 19, 26, 28, 47]. However, none of the
cited studies integrated in their framework unrated reviews.
For example, Jakob et al. [19] use explicit ratings and
enhance these existing ratings with opinion mining and
other techniques. Despite this difference, we compared the
proposed approach to Jakob’s et al. approach. This evalu-
ation illustrated how the recommendation framework is
competitive to similar approaches and how it can tackle
different scenarios.
As a future work we would like to improve the ratings
predictions by identifying reviews that are spam and
detecting particular aspects of the revised product. We
believe that other sentiment analysis algorithms can pro-
vide a finer-grain analysis of the user opinion and conse-
quently improve the overall recommendation performance.
Acknowledgments The authors are much appreciated to the authors
of [19] who have kindly provided us with their IMDb dataset. This
work has been funded by the Portuguese Foundation for Science and
Technology under project references UTA-Est/MAI/0010/2009 and
PEst-OE/EEI/UI0527/2011, Centro de Informatica e Tecnologias da
Informacao (CITI/FCT/UNL)—2011–2012.
References
1. Aciar, S., et al.: Informed recommender: basing recommenda-
tions on consumer product reviews. IEEE Intell. Syst. 22(3),
39–47 (2007)
2. Adomavicius, G., Tuzhilin, A.: Toward the next generation of
recommender systems: a survey of the state-of-the-art and pos-
sible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749
(2005)
3. Baccianella, S., et al.: SentiWordNet 3.0: an enhanced lexical
resource for sentiment analysis and opinion mining. In: Pro-
ceedings of the Seventh Conference on International Language
Resources and Evaluation (LREC’10) (2010)
4. Baudisch, P.: Recommending TV programs: how far can we get
at zero user effort? AAAI Workshop on Recommender Systems
(1998)
5. Bespalov, D., et al.: Sentiment classification based on supervised
latent n-gram analysis. Building, 375–382 (2011)
6. Bollen, J.: Determining the public mood state by analysis of
microblogging posts. Alife XII Conf. MIT Press (2010)
7. Brown, B., Barkhuus, L.: The television will be revolutionized:
effects of PVRs and filesharing on television watching. ACM
SIGCHI Conference on Human Factors in Computing Systems.
ACM (2006)
8. Das, S., Chen, M.: Yahoo! for Amazon: sentiment parsing from
small talk on the Web. EFA 2001 Barcelona Meetings (2001)
9. Denecke, K.: Are SentiWordNet scores suited for multi-domain
sentiment classification? In: Proceedings of ICDIM’2009, 33–38
(2009)
10. Diakopoulos, N.A., Shamma, D.A.: Characterizing debate per-
formance via aggregated twitter sentiment. In: Proceedings of the
28th International Conference on Human Factors in Computing
Systems (2010)
11. Ding, X., et al.: A holistic lexicon-based approach to opinion
mining. In: Proceedings of the International Conference on Web
Search and Web Data Mining, pp. 231–240 (2008)
12. Esuli, A., Sebastiani, F.: Determining the semantic orientation of
terms through gloss classification. In: Proceedings of the 14th
ACM International Conference on Information and Knowledge
Management CIKM 05, 617 (2005)
13. Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lex-
ical resource for opinion mining. In: Proceedings of the 5th
Conference on Language Resources and Evaluation (LREC’06).
Citeseer (2006)
14. Ferman, A.M., et al.: Multimedia content recommendation engine
with automatic inference of user preferences. In: IEEE Interna-
tional Conference on Image Processing (2003)
15. Harboe, G., et al.: Ambient social TV: drawing people into a
shared experience. In: ACM SIGCHI Conference on Human
Factors in Computing Systems. ACM (2008)
16. Haythornthwaite, C.: The strength and the impact of new media.
In: Proceedings of the 34th Annual Hawaii International Con-
ference on System Sciences (HICSS-34)-Volume 1-Volume 1.
IEEE Computer Society (2001)
17. Heerschop, B., et al.: Polarity analysis of texts using discourse
structure. In: Proceedings of the 20th ACM International Con-
ference on Information and Knowledge Management CIKM 11,
1061 (2011)
18. Hu, M., Liu, B.: Mining and summarizing customer reviews. In:
Proceedings of the Tenth ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining. ACM (2004)
19. Jakob, N., et al.: Beyond the stars: exploiting free-text user
reviews to improve the accuracy of movie recommendations. In:
Proceeding of the 1st International CIKM Workshop on TOPIC-
Sentiment Analysis for Mass Opinion (TSA), pp. 57–64 (2009)
20. Jenkins, H.: Convergence Culture—Where Old and New Collide.
NYU Press, New York (2006)
21. Jindal, N., Liu, B.: Opinion spam and analysis. In: WSDM’08
Proceedings of the International Conference on Web Search and
Web Data Mining, pp. 219–230 (2008)
22. Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In:
Proceedings of the 20th International Conference on Computa-
tional Linguistics COLING 04, 1367-es (2004)
23. Koren, Y.: Collaborative filtering with temporal dynamics. In:
Proceedings of the 15th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pp. 447–456 (2009)
24. Koren, Y.: Factorization meets the neighborhood: a multifaceted
collaborative filtering model. In: Proceeding of the 14th ACM
SIGKDD International Conference on Knowledge Discovery and
Data Mining, pp. 426–434 (2008)
25. Koren, Y., et al.: Matrix factorization techniques for recom-
mender systems. IEEE Comput. 42(8), 30–37 (2009)
26. Leung, C.W.K., et al.: Integrating collaborative filtering and
sentiment analysis: a rating inference approach. In: Proceedings
of the ECAI 2006 Workshop on Recommender Systems,
pp. 62–66 (2006)
27. Liu, B.: Sentiment analysis and subjectivity. Handbook of Nat-
ural Language Processing. (2010), 978-1420085921
28. Moshfeghi, Y., et al.: Handling data sparsity in collaborative
filtering using emotion and semantic based features. In: Pro-
ceedings of the 34th International ACM SIGIR Conference on
Research and Development in Information—SIGIR’11 (New
York, NY, USA, Jul. 2011), 625 (2011)
29. Ohana, B., Tierney, B.: Sentiment classification of reviews using
SentiWordNet. In: 9th. IT & T Conference (2009)
A recommender system for the TV on the web
123
30. Oliveira, E. et al.: Ifelt: accessing movies through our emotions.
In; Proceedings of the 9th International Interactive Conference on
Interactive Television—EuroITV’11 (New York, NY, USA, Jun.
2011), 105 (2011)
31. Pang, B., et al.: Thumbs up? Sentiment classification using
machine learning techniques. In: Proceedings of the ACL-02
Conference on Empirical Methods in Natural Language Pro-
cessing-Volume 10, 79–86 (2002)
32. Pang, B., Lee, L.: A sentimental education: sentiment analysis
using subjectivity summarization based on minimum cuts. In:
Proceedings of the Association for Computational Linguistics
(ACL) (2004)
33. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for
sentiment categorization with respect to rating scales. Computer
43(1), 115–124 (2005)
34. Qu, L., et al.: The bag-of-opinions method for review rating
prediction from sparse text patterns. In: COLING’10 Proceedings
of the 23rd International Conference on Computational Linguis-
tics, pp. 913–921 (2010)
35. Sparling, E.I.: Rating: how difficult is it? Methodology 21(3),
149–156 (2011)
36. Takama, Y., Muto, Y.: Profile generation from TV watching
behavior using sentiment analysis. In: Proceedings of the 2007
IEEE/WIC/ACM International Conferences on Web Intelligence
and Intelligent Agent Technology—Workshops. IEEE Computer
Society (2007)
37. Takamura, H., et al.: Extracting semantic orientations of words
using spin model. In: Proceedings of ACL05 43rd Annual
Meeting of the Association for Computational Linguistics,
pp. 133–140 (2005)
38. Turney, P.: Thumbs up or thumbs down? Semantic orientation
applied to unsupervised classification of reviews. In: Proceedings
of the 40th Annual Meeting on Association for Computational
Linguistics (2002)
39. Turney, P.D., Littman, M.L.: Measuring praise and criticism:
inference of semantic orientation from association. ACM Trans.
Inf. Syst. 21(4), 37 (2003)
40. Turney, P.D., Littman, M.L.: Unsupervised learning of semantic
orientation from a hundred-billion-word corpus. Information
Retrieval. ERB-1094, 11 (2002)
41. Uchyigit, G., Clark, K.: Personalised multi-modal electronic
program guide. In: European Conference on Interactive Televi-
sion: from Viewers to Actors? (2003)
42. Vildjiounaite, E., Kyllonen, V.: Unobtrusive dynamic modelling
of TV program preferences in a household. Changing Television
(2008)
43. Xu, J., Zhang, L.: The development and prospect of personalized
TV program recommendation systems. Multimedia Software
Engineering (2002)
44. Yi, J., et al.: Sentiment analyzer: extracting sentiments about a
given topic using natural language processing techniques. In:
IEEE International Conference on Data Mining (ICDM),
pp. 427–434 (2003)
45. Yuan, G.X., et al.: Recent advances of large-scale linear classi-
fication. Computer 3, 1–15 (2011)
46. Zaletelj, J., et al.: Real-time viewer feedback in the iTV pro-
duction. European Conference on Interactive Television and
Video. ACM (2009)
47. Zhang, W., et al.: Augmenting online video recommendations by
fusing review sentiment classification. Data Mining Workshops
(ICDMW), 2010 IEEE International Conference on, pp. 1143–1150
(2010)
F. Peleja et al.
123