SIGIR-Open Source Information Retrieval workshop, Aug. 10. 2006, Seattle, WA.
Web Recommender System Implementations in Multiple Flavors:
Fast and (Care-)Free for All
Olfa Nasraoui, Zhiyong Zhang, Esin SakaDept of Computer Engineering & Computer Sciences
University of LouisvilleE-mail: [email protected]
URL: http://www.louisville.edu/~o0nasr01/
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Outline• Motivations & Objectives• Background in Web Recommender
Systems• Background in search: Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Motivations• Recommender systems: attracting more and
more attention as a suitable approach against information overload
• Recently: Significant advances in theoretical and algorithmic aspects
• But no clear and systematic easyimplementation approaches published openly
• Typical commercial packages may not be affordable to smaller or start-up e-commerce, e-media websites, and community websites
• Quickly implementing recommender systems for research purposes may also be a headache
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Objectives• Systematic framework for a fast and easy
implementation and deployment of a recommendation system
• on one or several affiliated or subject-specific websites
• based on any available combination of open source tools that include – crawling, – indexing, and – searching capabilities
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Supported Approaches• Content based filtering (straight forward)• Collaborative filtering (more complex)• Rule-based filtering• Approaches that deal with meta-content
(non-textual) attributes and ontologies, • Other variants that include
– meta-attributes about the user, such as user profiles,
– business strategy rules.
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
What for?• easily "implement" (existing)
recommendation strategies by using a search engine software when it is available,
• benefit research and real life applications – by taking advantage of search engines'
scalable and built-in indexing and query matching features,
– instead of implementing a strategy from scratch.
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Advantages to Expect…• Multi-Website Integration by Dynamic Linking:
– dynamic, personalized, and automated linking of partnering or affiliate websites
– Crawl several websites + connect through common proxy
• Giving Control Back to the User or Community instead of the website/business– no need for intervention from websites
• The Open Source Edge• Tapping into IR Legacy• In future: Semantic Web…?
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Outline• Motivations & Objectives• Background in Web Recommender
Systems• Background in search: Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Recommender Systems• automatically recommend hyperlinks deemed to
be relevant to the user’s interests, • can facilitate access to the needed information on
a large website usually implemented on the Web server,
• relies on data that reflects the user’s interest – implicitly (browsing history as recorded in Web server
logs) or – explicitly (user profile as entered through registration
form or questionnaire).
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
How can they be classified? …
PastInformation ?
CurrentInformation ?
Algorithm
?Recommendations
•Transactions
•Clickstreams
•Ratings
•Demographic data
Viewed products, pages, purchases
during the current session or transaction
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Five Main Types…
• Content-based or Item-based filtering• Collaborative filtering• Knowledge Engineering or Rule-based filtering• Demographic recommender systems• Hybrids
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Main Distinctions• Collaborative Filtering: recommend items liked by similar users
– Technique:• Neighborhood Formation: based on the users or items by correlation,
similarity or cluster formation• Association Rule Mining: association between items or users• Machine Learning: inductively learn patterns or prediction models from Data
by Bayesian networks, neural nets, decision trees, clustering• Content-Based Filtering: recommend similar items to the ones
liked by current user – Technique:
• Neighborhood Formation: use co-occurrence, clustering, or similarity analysis to determine neighborhoods of “content”
• Knowledge Engineering or Rule-based filtering: determine recommendations based on rules/dialog with user– Technique:
• use human effort or heuristics to identify factors that determine user interests• Case based reasoning, decision support tools
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Hybrids• Merging Results of different approaches: combine
results of CF and CBF using weighted sum or merging of lists …etc
• Hybrid examples:– Collaborative Filtering augmented by Content-Based
Filtering: still within collaborative framework, but add item attributes into each user’s attributes (i.e. infuse item data within user data)
– Content-Based Filtering augmented by Collaborative Filtering: identify similar items as the ones liked by thisuser, and then add items liked by similar users
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Outline• Motivations & Objectives• Background in Web Recommender
Systems• Background in search: Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Search Engine• Crawling: A crawler retrieves the web pages that are to be
included in a searchable collection, • Parsing: The crawled documents are parsed to extract the
terms that they contain, • Indexing: An inverted index is typically built that maps each
parsed term to a set of pages where the term is contained, • Query matching: submit input queries in the form of a set of
terms to a search engine interface or to a query matching module that compares this query against the existing index, to produce a ranked list of results or web pages.
• We focus on two open source products that enable a fast and free implementation of Web search,– a text search engine library called Lucene, – a Web search engine, Nutch, that is built on Lucene
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Lucene• D. Cutting and J. Pedersen, Space optimizations for total
ranking, RIAO (Computer Assisted IR) 1997• http://lucene.apache.org/• high-performance, full-featured text search engine library
written in Java, • can support any application that requires full-text search,
especially cross-platform. • Examples of using Lucene: Inktomi and Wikipedia's
search feature• powerful features through a simple API, include
– scalable, high-performance indexing, – as well as powerful, accurate and efficient search algorithms.
• available as Open Source software under the Apache License
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Lucene’s features
• ranked searching • various query types: phrase, wildcard, proximity,
fuzzy, range, and more • fielded searching (e.g., title, author, contents), • date-range searching, • sorting by any field, • multiple-index searching with merged results, • allowing simultaneous update and searching• All the above Heaven on Earth! for
implementing recommender system
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Lucene: Query & Search• Query: terms and operators, with two types of
terms: Single Terms and Phrases– Multiple terms can be combined together with Boolean
operators to form a more complex query. – Lucene also supports fielded data
• search performed by specifying a field, or by using the default field
• The field names and default field are implementation specific– Lucene queries can also boost a particular term so that
it weighs more heavily in the matching process• Scoring: Given an input query, Lucene scores the
results based on the TF-IDF measure where the documents are represented in the vector space model
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Nutch• http://lucene.apache.org/nutch/: Lucene based Web search• adds Web specifics to Lucene: crawler, link-graph database, parsers
for HTML and other document formats (pdf, ppt, doc, plain text, etc). • Other doc. types can be parsed by programming their specific plugin. • maintains a Database of pages and links. • For each web page that is indexed, only the URL and title are stored,
while the anchor and the content are only indexed. • A document: sequence of Fields.
– A field is a <Name, value> pair, – where Name is the name of the field, e.g., title, body, subject, date,
etc; – and Value is text. – Field values may be stored, indexed, analyzed (to convert to
tokens), or vectored. • Uses Lucene's index: Inverted Index that maps a term field ID, and
a set of document IDs, with the position within each document. • Given a query, Nutch by default searches URLs, anchors, and content
of documents, and also rewards for proximity of the query words in the documents.
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Outline• Motivations & Objectives• Background in Web Recommender
Systems• Background in Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Proposed Methodology• Two requirements for tweaking a search
engine to work like a recommender sys.1. An index: The source of the recommendations
must be indexed in a format that is easy to search.
2. A querying mechanism:– the input to the recommendation procedure must be
transformable into a query– Query is expressed in terms of the entities upon
which the index is based. – For instance if the index maps every keyword in a given
vocabulary to a set of pages that contain this keyword, then the query needs to be expressed in terms of these keywords.
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Notation• assume that both a raw session (and for
collaborative filtering: a simple collaborative profile) consist of a set of indexed items/Web pages Pi– note that this set can also be considered as a
vector in the space defined by all possible indexed items, thus
• raw-session = {P1 , P2 , ..., etc}• profile = {P’1 , P’2, ..., etc}
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Content-based filteringgiven a few pages that a user has viewed, the system recommends other
pages with content that is similar to the content of the viewed pages• Step 1: Preliminary Crawling and Indexing of website(s) (done offline): to
form content of the recommendations, and then forming a reverse index that maps each keyword to a set of pages in which it is contained. • Since we plan to compare a query to a set of web pages in vector space
format, we store the most frequent terms in each document as a vector field, that is indexed and used later in retrieval by submitting a fielded query.
• Step 2: Query Formation and Scoring: transform a new user session into a query that can be submitted to the search engine. • Map each URL in user session to a set of content terms (top k frequent
terms) using an added package net.nutch.searcher.pageurl. • Combine these terms with their frequencies to form a query vector, • Submit query to Nutch as a Fielded query (i.e. the query vector is compared to
the indexed Web document vector field). • Finally, rank results according to cosine similarity with the query vector in the
vector space domain • introduced as a modification of the default scoring mechanism of
SortComparatorSource in the LuceneQueryOptimizer class (which is part of the package net.nutch.searcher)
session URLS terms fielded query vector results (ranked according to cosine similarity (result vector, query vector))
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Content-based filtering w/ Nutch
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Session query
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering
• Hybrid Content via Collaborative Filtering with Cascaded/Feature Augmentation Combination
• Hybrid Content via Profile-based Collaborative Filtering with Cascaded/Feature Augmentation Combination
• Hybrid Content and Profile-based Collaborative Filtering with Weighted Combination
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering
• Hybrid Content via Collaborative Filtering with Cascaded/Feature Augmentation Combination
• Hybrid Content via Profile-based Collaborative Filtering with Cascaded/Feature Augmentation Combination
• Hybrid Content and Profile-based Collaborative Filtering with Weighted Combination
•Step 1: Index sessions in URL domain (sessions items BECOME the content attributes); •Step 2:
•Step 2.1: (session = query) fielded query vector results (sessions ranked according to cosine similarity (result session, query session)) •Step 2.2: accumulate frequency of URLs in top K closest sessions sort URLs top down results (top N recommended URLs)
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Hybrids
Recommendations
(items)
Info 1
Combine
Info 2
Collaborative filtering
Recommendations
Content-based filtering
Recommendations
Recommendations
(items)
Info 1 Info 2
Collaborative filtering
Content-based filtering
Sequential or Cascaded
Parallel or Combination
Collaborative session
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Cascaded Hybrids
Recommendations
(items)
Previous sessions Info 2
Collaborative filtering
Content-based filtering
Type 1: compares current session to (all) previous sessions
Collaborative session
Recommendations
(items)
User profiles Info 2
Collaborative filtering
Content-based filtering
Type 2: compares current session to several user profiles
Collaborative session
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering
• Hybrid Content via Collaborative Filtering with Cascaded/Feature Augmentation Combination
Step 1.1: crawl+Index web pages in term domain (like content-based filtering)Index 1
Step 1.2: Index sessions in URL domain as in Step 1 of pure collaborative filtering Index 2
Step 2 = Step 2 above: i.e. match current session to similar sessions using Index 2results (top N recommended URLs) = collaborative session
Step 3 = Step 2 of content-based filtering i.e. collaborative session URLS terms fielded query vector results (ranked according to cosine similarity (result vector, query vector)) using Index 1
•Step 1: Index sessions in URL domain (sessions items BECOME the content attributes); •Step 2:
•Step 2.1: (session = query) fielded query vector results (sessions ranked according to cosine similarity (result session, query session)) •Step 2.2: accumulate frequency of URLs in top K closest sessions sort URLs top down results (top N recommended URLs)
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering• Hybrid Content via Collaborative Filtering with
Cascaded/Feature Augmentation Combination• Hybrid Content via Profile-based Collaborative
Filtering with Cascaded/Feature Augmentation Combination
Step 0: Mine User Profiles (done offline or online with scalable Web usage mining techniques , profile = set of URLs relevant to a cluster of similar users)Step 1: crawl+Index web pages in term domain (like content-based filtering)
Index 1Step 2 (similar to step 2 above) = match current session to nearest profiles instead nearest sessions (simple distance computation) accumulate and sort URLs in profiles results (top N recommended URLs) = collaborative sessionStep 3 = Step 2 of content-based filtering i.e. collaborative session URLS terms fielded query vector results (ranked according to cosine similarity (result vector, query vector)) using Index 1
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering• Hybrid Content via Collaborative Filtering with
Cascaded/Feature Augmentation Combination• Hybrid Content via Profile-based Collaborative Filtering with
Cascaded/Feature Augmentation Combination• Hybrid Content and Profile-based Collaborative
Filtering with Weighted CombinationStep 0: Mine User Profiles (done offline or online with scalable Web usage mining techniques , profile = set of URLs relevant to a cluster of similar users)Step 1: crawl+Index web pages in term domain (like content-based filtering)
Index 1Step 2 (similar to step 2 above) = match current session to nearest profiles instead nearest sessions (simple distance computation) accumulate and sort URLs in profiles results (top N recommended URLs) = Rec 1Step 3 = Step 2 of content-based filtering i.e. current session URLS terms
fielded query vector results = Rec 2 (ranked according to cosine similarity (result vector, query vector)) using Index 1Step 4: Combine recommendation sets Rec = w1 Rec1 + w2 Rec2
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Other Recommendation Approaches• The content attribute is stored in a Database (e.g. product
attributes): index as fielded data & use fielded queries• The content attributes are described or connected via an
ontology or a set of rules: index as fielded data, use fielded queries, & modify searcher similarity matching to take into account ontology + rules– Similarity degree may be suppressed or promoted depending on how
a recommendation fits a given rule.• An elaborate user profile: modify searcher similarity
matching to take into account user profile• A set of business rules: modify searcher similarity matching
to implement these rules• Finally recommend based on anchor (instead of content):
we did this by storing anchor info for each page + transforming session to list of terms in anchor text(s)
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Outline• Motivations & Objectives• Background in Web Recommender
Systems• Background in Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Implementation• Crawled the web pages in the louisville.edu domain,
– where we limited the crawling to a depth of 6 levels resulting in 30,405 pages
– (this corresponds to Step 1 of content-based filtering)• The content was indexed using nutch (23,097 unique
terms were indexed), • the nutch search engine application was launched to
accept queries (in our case transformed user sessions!)• A proxy was set at one port on our server
http://136.165.45.122:8089/ based on the Open Source SQUID Web proxy software (http://www.squid-cache.org/)
• With additional C code to track each session, convert it to an appropriate query, and submit this query to nutch
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Quality Of Content-based Filtering Recommendations
• by comparing the contents of the Nth recommended page(s) to the actual pages to be visited in the remainder of a real session, using the cosine similarity.
• The k most frequent termsin a page were used to form its vector space representation, First, we varied k to take the values: 5, 10, 20, 30, 40,
• The results used for evaluation purposes 450 sessions consisting of at least 3 clicks, and chosen randomly from one day's web logs.
• quality of the results saturated around k=30
1 2 3 4 5 6 7 8 9 100.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32Content−based Recommendation Similarity Results
Nth recommendation page
Cos
ine
Sim
ilarit
y w
ith to
−be
−vi
site
d pa
ge
top−5 termstop−10 termstop−20 termstop−30 terms
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Quality Of Content-based Filtering Recommendations
• fixed the maximum number of terms (k=10) in the vector space representation
• determined the recommendation quality for sessions of different lengths
1 2 3 4 5 6 7 8 9 100.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3Content−based Recommendation Similarity for Different Session Length
Nth recommendation page
Cos
ine
Sim
ilarit
y w
ith to
−be
−vi
site
d pa
ge
2−length sessions3−length sessions4−length sessions5−length sessions6−length sessions
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Implementation• Crawled web pages in following domains:
– .wikipedia.org– .ucar.edu– .nasa.gov
(this corresponds to Step 1 of content-based filtering)• The content was indexed using nutch• the nutch search engine application was launched to accept
queries (in our case transformed user sessions!)• A proxy was set at one port on our server based on the
Open Source SQUID Web proxy software (http://www.squid-cache.org/)
• Additional C code to track each session, convert it to an appropriate query, and submit this query to nutch
• As user navigates through pages in the 3 above crawled/indexed domains, cross-website recommendationscan be accessed by following a link added to the top of each page.
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Example 1: visiting pages about solar corona on windows.ucar.edu brings up related recommendations from other domains/websites: ucar.edu, nasa.gov,
wikipedia.org
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Snapshots of some relatedrecommendations from other
domains/websites: ucar.edu, nasa.gov, wikipedia.org
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Outline• Motivations & Objectives• Background in Web Recommender
Systems• Background in Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion
SIGIR-OSIR workshop, Aug. 2006, Seattle.
Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All
Conclusions• a systematic framework for a fast and easy implementation and
deployment of a recommendation system for one or severalWebsites
• based on open source tools that include crawling, indexing, and searching capabilities.
• 2 main ingredients: indexing + transforming session query• The supported recommendation strategies include
– several popular flavors such as • content based filtering (straight forward: just crawl, index, query), • collaborative filtering (more complex: need session index OR mined profiles), • Hybrids (combine above 2 in cascaded or parallel mode)• rule-based (modify similarity used in matching query to content),
– as well as approaches that deal with non-textual attributes (fielded vector + query), ontologies, or rich user profiles (modify similarity in matching)
• The biggest advantage of this approach is client or proxy controlled integration + dynamic linking of several websites.