+ All Categories
Home > Documents > Web Recommender System Implementations in Multiple...

Web Recommender System Implementations in Multiple...

Date post: 24-Mar-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
49
SIGIR-Open Source Information Retrieval workshop, Aug. 10. 2006, Seattle, WA. Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All Olfa Nasraoui, Zhiyong Zhang, Esin Saka Dept of Computer Engineering & Computer Sciences University of Louisville E-mail: [email protected] URL: http://www.louisville.edu/~o0nasr01/
Transcript
Page 1: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-Open Source Information Retrieval workshop, Aug. 10. 2006, Seattle, WA.

Web Recommender System Implementations in Multiple Flavors:

Fast and (Care-)Free for All

Olfa Nasraoui, Zhiyong Zhang, Esin SakaDept of Computer Engineering & Computer Sciences

University of LouisvilleE-mail: [email protected]

URL: http://www.louisville.edu/~o0nasr01/

Page 2: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Outline• Motivations & Objectives• Background in Web Recommender

Systems• Background in search: Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion

Page 3: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Motivations• Recommender systems: attracting more and

more attention as a suitable approach against information overload

• Recently: Significant advances in theoretical and algorithmic aspects

• But no clear and systematic easyimplementation approaches published openly

• Typical commercial packages may not be affordable to smaller or start-up e-commerce, e-media websites, and community websites

• Quickly implementing recommender systems for research purposes may also be a headache

Page 4: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Objectives• Systematic framework for a fast and easy

implementation and deployment of a recommendation system

• on one or several affiliated or subject-specific websites

• based on any available combination of open source tools that include – crawling, – indexing, and – searching capabilities

Page 5: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Supported Approaches• Content based filtering (straight forward)• Collaborative filtering (more complex)• Rule-based filtering• Approaches that deal with meta-content

(non-textual) attributes and ontologies, • Other variants that include

– meta-attributes about the user, such as user profiles,

– business strategy rules.

Page 6: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

What for?• easily "implement" (existing)

recommendation strategies by using a search engine software when it is available,

• benefit research and real life applications – by taking advantage of search engines'

scalable and built-in indexing and query matching features,

– instead of implementing a strategy from scratch.

Page 7: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Advantages to Expect…• Multi-Website Integration by Dynamic Linking:

– dynamic, personalized, and automated linking of partnering or affiliate websites

– Crawl several websites + connect through common proxy

• Giving Control Back to the User or Community instead of the website/business– no need for intervention from websites

• The Open Source Edge• Tapping into IR Legacy• In future: Semantic Web…?

Page 8: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Outline• Motivations & Objectives• Background in Web Recommender

Systems• Background in search: Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion

Page 9: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Recommender Systems• automatically recommend hyperlinks deemed to

be relevant to the user’s interests, • can facilitate access to the needed information on

a large website usually implemented on the Web server,

• relies on data that reflects the user’s interest – implicitly (browsing history as recorded in Web server

logs) or – explicitly (user profile as entered through registration

form or questionnaire).

Page 10: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

How can they be classified? …

PastInformation ?

CurrentInformation ?

Algorithm

?Recommendations

•Transactions

•Clickstreams

•Ratings

•Demographic data

Viewed products, pages, purchases

during the current session or transaction

Page 11: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Five Main Types…

• Content-based or Item-based filtering• Collaborative filtering• Knowledge Engineering or Rule-based filtering• Demographic recommender systems• Hybrids

Page 12: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Main Distinctions• Collaborative Filtering: recommend items liked by similar users

– Technique:• Neighborhood Formation: based on the users or items by correlation,

similarity or cluster formation• Association Rule Mining: association between items or users• Machine Learning: inductively learn patterns or prediction models from Data

by Bayesian networks, neural nets, decision trees, clustering• Content-Based Filtering: recommend similar items to the ones

liked by current user – Technique:

• Neighborhood Formation: use co-occurrence, clustering, or similarity analysis to determine neighborhoods of “content”

• Knowledge Engineering or Rule-based filtering: determine recommendations based on rules/dialog with user– Technique:

• use human effort or heuristics to identify factors that determine user interests• Case based reasoning, decision support tools

Page 13: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Hybrids• Merging Results of different approaches: combine

results of CF and CBF using weighted sum or merging of lists …etc

• Hybrid examples:– Collaborative Filtering augmented by Content-Based

Filtering: still within collaborative framework, but add item attributes into each user’s attributes (i.e. infuse item data within user data)

– Content-Based Filtering augmented by Collaborative Filtering: identify similar items as the ones liked by thisuser, and then add items liked by similar users

Page 14: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Outline• Motivations & Objectives• Background in Web Recommender

Systems• Background in search: Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion

Page 15: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Search Engine• Crawling: A crawler retrieves the web pages that are to be

included in a searchable collection, • Parsing: The crawled documents are parsed to extract the

terms that they contain, • Indexing: An inverted index is typically built that maps each

parsed term to a set of pages where the term is contained, • Query matching: submit input queries in the form of a set of

terms to a search engine interface or to a query matching module that compares this query against the existing index, to produce a ranked list of results or web pages.

• We focus on two open source products that enable a fast and free implementation of Web search,– a text search engine library called Lucene, – a Web search engine, Nutch, that is built on Lucene

Page 16: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Lucene• D. Cutting and J. Pedersen, Space optimizations for total

ranking, RIAO (Computer Assisted IR) 1997• http://lucene.apache.org/• high-performance, full-featured text search engine library

written in Java, • can support any application that requires full-text search,

especially cross-platform. • Examples of using Lucene: Inktomi and Wikipedia's

search feature• powerful features through a simple API, include

– scalable, high-performance indexing, – as well as powerful, accurate and efficient search algorithms.

• available as Open Source software under the Apache License

Page 17: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Lucene’s features

• ranked searching • various query types: phrase, wildcard, proximity,

fuzzy, range, and more • fielded searching (e.g., title, author, contents), • date-range searching, • sorting by any field, • multiple-index searching with merged results, • allowing simultaneous update and searching• All the above Heaven on Earth! for

implementing recommender system

Page 18: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Lucene: Query & Search• Query: terms and operators, with two types of

terms: Single Terms and Phrases– Multiple terms can be combined together with Boolean

operators to form a more complex query. – Lucene also supports fielded data

• search performed by specifying a field, or by using the default field

• The field names and default field are implementation specific– Lucene queries can also boost a particular term so that

it weighs more heavily in the matching process• Scoring: Given an input query, Lucene scores the

results based on the TF-IDF measure where the documents are represented in the vector space model

Page 19: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Nutch• http://lucene.apache.org/nutch/: Lucene based Web search• adds Web specifics to Lucene: crawler, link-graph database, parsers

for HTML and other document formats (pdf, ppt, doc, plain text, etc). • Other doc. types can be parsed by programming their specific plugin. • maintains a Database of pages and links. • For each web page that is indexed, only the URL and title are stored,

while the anchor and the content are only indexed. • A document: sequence of Fields.

– A field is a <Name, value> pair, – where Name is the name of the field, e.g., title, body, subject, date,

etc; – and Value is text. – Field values may be stored, indexed, analyzed (to convert to

tokens), or vectored. • Uses Lucene's index: Inverted Index that maps a term field ID, and

a set of document IDs, with the position within each document. • Given a query, Nutch by default searches URLs, anchors, and content

of documents, and also rewards for proximity of the query words in the documents.

Page 20: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Outline• Motivations & Objectives• Background in Web Recommender

Systems• Background in Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion

Page 21: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Proposed Methodology• Two requirements for tweaking a search

engine to work like a recommender sys.1. An index: The source of the recommendations

must be indexed in a format that is easy to search.

2. A querying mechanism:– the input to the recommendation procedure must be

transformable into a query– Query is expressed in terms of the entities upon

which the index is based. – For instance if the index maps every keyword in a given

vocabulary to a set of pages that contain this keyword, then the query needs to be expressed in terms of these keywords.

Page 22: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Notation• assume that both a raw session (and for

collaborative filtering: a simple collaborative profile) consist of a set of indexed items/Web pages Pi– note that this set can also be considered as a

vector in the space defined by all possible indexed items, thus

• raw-session = {P1 , P2 , ..., etc}• profile = {P’1 , P’2, ..., etc}

Page 23: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Content-based filteringgiven a few pages that a user has viewed, the system recommends other

pages with content that is similar to the content of the viewed pages• Step 1: Preliminary Crawling and Indexing of website(s) (done offline): to

form content of the recommendations, and then forming a reverse index that maps each keyword to a set of pages in which it is contained. • Since we plan to compare a query to a set of web pages in vector space

format, we store the most frequent terms in each document as a vector field, that is indexed and used later in retrieval by submitting a fielded query.

• Step 2: Query Formation and Scoring: transform a new user session into a query that can be submitted to the search engine. • Map each URL in user session to a set of content terms (top k frequent

terms) using an added package net.nutch.searcher.pageurl. • Combine these terms with their frequencies to form a query vector, • Submit query to Nutch as a Fielded query (i.e. the query vector is compared to

the indexed Web document vector field). • Finally, rank results according to cosine similarity with the query vector in the

vector space domain • introduced as a modification of the default scoring mechanism of

SortComparatorSource in the LuceneQueryOptimizer class (which is part of the package net.nutch.searcher)

session URLS terms fielded query vector results (ranked according to cosine similarity (result vector, query vector))

Page 24: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Content-based filtering w/ Nutch

Page 25: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Session query

Page 26: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering

• Hybrid Content via Collaborative Filtering with Cascaded/Feature Augmentation Combination

• Hybrid Content via Profile-based Collaborative Filtering with Cascaded/Feature Augmentation Combination

• Hybrid Content and Profile-based Collaborative Filtering with Weighted Combination

Page 27: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering

• Hybrid Content via Collaborative Filtering with Cascaded/Feature Augmentation Combination

• Hybrid Content via Profile-based Collaborative Filtering with Cascaded/Feature Augmentation Combination

• Hybrid Content and Profile-based Collaborative Filtering with Weighted Combination

•Step 1: Index sessions in URL domain (sessions items BECOME the content attributes); •Step 2:

•Step 2.1: (session = query) fielded query vector results (sessions ranked according to cosine similarity (result session, query session)) •Step 2.2: accumulate frequency of URLs in top K closest sessions sort URLs top down results (top N recommended URLs)

Page 28: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Hybrids

Recommendations

(items)

Info 1

Combine

Info 2

Collaborative filtering

Recommendations

Content-based filtering

Recommendations

Recommendations

(items)

Info 1 Info 2

Collaborative filtering

Content-based filtering

Sequential or Cascaded

Parallel or Combination

Collaborative session

Page 29: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Cascaded Hybrids

Recommendations

(items)

Previous sessions Info 2

Collaborative filtering

Content-based filtering

Type 1: compares current session to (all) previous sessions

Collaborative session

Recommendations

(items)

User profiles Info 2

Collaborative filtering

Content-based filtering

Type 2: compares current session to several user profiles

Collaborative session

Page 30: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering

• Hybrid Content via Collaborative Filtering with Cascaded/Feature Augmentation Combination

Step 1.1: crawl+Index web pages in term domain (like content-based filtering)Index 1

Step 1.2: Index sessions in URL domain as in Step 1 of pure collaborative filtering Index 2

Step 2 = Step 2 above: i.e. match current session to similar sessions using Index 2results (top N recommended URLs) = collaborative session

Step 3 = Step 2 of content-based filtering i.e. collaborative session URLS terms fielded query vector results (ranked according to cosine similarity (result vector, query vector)) using Index 1

•Step 1: Index sessions in URL domain (sessions items BECOME the content attributes); •Step 2:

•Step 2.1: (session = query) fielded query vector results (sessions ranked according to cosine similarity (result session, query session)) •Step 2.2: accumulate frequency of URLs in top K closest sessions sort URLs top down results (top N recommended URLs)

Page 31: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering• Hybrid Content via Collaborative Filtering with

Cascaded/Feature Augmentation Combination• Hybrid Content via Profile-based Collaborative

Filtering with Cascaded/Feature Augmentation Combination

Step 0: Mine User Profiles (done offline or online with scalable Web usage mining techniques , profile = set of URLs relevant to a cluster of similar users)Step 1: crawl+Index web pages in term domain (like content-based filtering)

Index 1Step 2 (similar to step 2 above) = match current session to nearest profiles instead nearest sessions (simple distance computation) accumulate and sort URLs in profiles results (top N recommended URLs) = collaborative sessionStep 3 = Step 2 of content-based filtering i.e. collaborative session URLS terms fielded query vector results (ranked according to cosine similarity (result vector, query vector)) using Index 1

Page 32: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Collaborative filtering & Hybrids• Full-Fledged Pure Collaborative Filtering• Hybrid Content via Collaborative Filtering with

Cascaded/Feature Augmentation Combination• Hybrid Content via Profile-based Collaborative Filtering with

Cascaded/Feature Augmentation Combination• Hybrid Content and Profile-based Collaborative

Filtering with Weighted CombinationStep 0: Mine User Profiles (done offline or online with scalable Web usage mining techniques , profile = set of URLs relevant to a cluster of similar users)Step 1: crawl+Index web pages in term domain (like content-based filtering)

Index 1Step 2 (similar to step 2 above) = match current session to nearest profiles instead nearest sessions (simple distance computation) accumulate and sort URLs in profiles results (top N recommended URLs) = Rec 1Step 3 = Step 2 of content-based filtering i.e. current session URLS terms

fielded query vector results = Rec 2 (ranked according to cosine similarity (result vector, query vector)) using Index 1Step 4: Combine recommendation sets Rec = w1 Rec1 + w2 Rec2

Page 33: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Other Recommendation Approaches• The content attribute is stored in a Database (e.g. product

attributes): index as fielded data & use fielded queries• The content attributes are described or connected via an

ontology or a set of rules: index as fielded data, use fielded queries, & modify searcher similarity matching to take into account ontology + rules– Similarity degree may be suppressed or promoted depending on how

a recommendation fits a given rule.• An elaborate user profile: modify searcher similarity

matching to take into account user profile• A set of business rules: modify searcher similarity matching

to implement these rules• Finally recommend based on anchor (instead of content):

we did this by storing anchor info for each page + transforming session to list of terms in anchor text(s)

Page 34: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Outline• Motivations & Objectives• Background in Web Recommender

Systems• Background in Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion

Page 35: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Implementation• Crawled the web pages in the louisville.edu domain,

– where we limited the crawling to a depth of 6 levels resulting in 30,405 pages

– (this corresponds to Step 1 of content-based filtering)• The content was indexed using nutch (23,097 unique

terms were indexed), • the nutch search engine application was launched to

accept queries (in our case transformed user sessions!)• A proxy was set at one port on our server

http://136.165.45.122:8089/ based on the Open Source SQUID Web proxy software (http://www.squid-cache.org/)

• With additional C code to track each session, convert it to an appropriate query, and submit this query to nutch

Page 36: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Quality Of Content-based Filtering Recommendations

• by comparing the contents of the Nth recommended page(s) to the actual pages to be visited in the remainder of a real session, using the cosine similarity.

• The k most frequent termsin a page were used to form its vector space representation, First, we varied k to take the values: 5, 10, 20, 30, 40,

• The results used for evaluation purposes 450 sessions consisting of at least 3 clicks, and chosen randomly from one day's web logs.

• quality of the results saturated around k=30

1 2 3 4 5 6 7 8 9 100.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32Content−based Recommendation Similarity Results

Nth recommendation page

Cos

ine

Sim

ilarit

y w

ith to

−be

−vi

site

d pa

ge

top−5 termstop−10 termstop−20 termstop−30 terms

Page 37: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Quality Of Content-based Filtering Recommendations

• fixed the maximum number of terms (k=10) in the vector space representation

• determined the recommendation quality for sessions of different lengths

1 2 3 4 5 6 7 8 9 100.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3Content−based Recommendation Similarity for Different Session Length

Nth recommendation page

Cos

ine

Sim

ilarit

y w

ith to

−be

−vi

site

d pa

ge

2−length sessions3−length sessions4−length sessions5−length sessions6−length sessions

Page 38: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Implementation• Crawled web pages in following domains:

– .wikipedia.org– .ucar.edu– .nasa.gov

(this corresponds to Step 1 of content-based filtering)• The content was indexed using nutch• the nutch search engine application was launched to accept

queries (in our case transformed user sessions!)• A proxy was set at one port on our server based on the

Open Source SQUID Web proxy software (http://www.squid-cache.org/)

• Additional C code to track each session, convert it to an appropriate query, and submit this query to nutch

• As user navigates through pages in the 3 above crawled/indexed domains, cross-website recommendationscan be accessed by following a link added to the top of each page.

Page 39: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Example 1: visiting pages about solar corona on windows.ucar.edu brings up related recommendations from other domains/websites: ucar.edu, nasa.gov,

wikipedia.org

Page 40: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Page 41: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Page 42: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Snapshots of some relatedrecommendations from other

domains/websites: ucar.edu, nasa.gov, wikipedia.org

Page 43: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Page 44: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Page 45: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Page 46: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Page 47: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Page 48: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Outline• Motivations & Objectives• Background in Web Recommender

Systems• Background in Lucene+nutch• Methodology• Preliminary Experimental Results• Conclusion

Page 49: Web Recommender System Implementations in Multiple …webmining.spd.louisville.edu/open-source...terms to a search engine interface or to a query matching module that compares this

SIGIR-OSIR workshop, Aug. 2006, Seattle.

Nasraoui et al.: Web Recommender System Implementations in Multiple Flavors: Fast and (Care-)Free for All

Conclusions• a systematic framework for a fast and easy implementation and

deployment of a recommendation system for one or severalWebsites

• based on open source tools that include crawling, indexing, and searching capabilities.

• 2 main ingredients: indexing + transforming session query• The supported recommendation strategies include

– several popular flavors such as • content based filtering (straight forward: just crawl, index, query), • collaborative filtering (more complex: need session index OR mined profiles), • Hybrids (combine above 2 in cascaded or parallel mode)• rule-based (modify similarity used in matching query to content),

– as well as approaches that deal with non-textual attributes (fielded vector + query), ontologies, or rich user profiles (modify similarity in matching)

• The biggest advantage of this approach is client or proxy controlled integration + dynamic linking of several websites.


Recommended