A Vector Space Model for Automatic Indexing
G. Salton, A. Wong and C. S. Yang
Enhanced Vector Space Models for Content-based Recommender Systems
Cataldo Musto
PresenterSawood Alam <[email protected]>
A Vector Space Model for Automatic Indexing
G. Salton, A. Wong and C. S. YangCornell University
Introduction
• In document retrieval, best indexing space is where each entity lies far away from others
• Density of the object space becomes a measure of indexing system
• Retrieval performance correlate inversely with space density
Document Space
• Di = (di1, di2, di3, …, dij)
Document Space (cont.)
Document Space (cont.)
Indexing Performance vs. Space Density
Cluster Density vs. Indexing Performance
Discrimination Value Model
Discrimination Value Model (cont.)
Discrimination Value Model Summary
Average Recall vs. Precision
Summary Recall vs. Precision
Enhanced Vector Space Models for Content-based Recommender Systems
Cataldo MustoDept. of Computer Science
University of Bari, [email protected]
Introduction
• Vector Space Models (VSM) in Information Retrieval is an established practice
• Investigate the impact of vector space models in Information Filtering– Recommender system
Problems of VSM
• High dimensionality– Becoming more serious due to emerging social
apps and micro-blogging, generating lots of web content and new vocabulary
• Inability to manage document semantics– Order of the term occurrence in the document
Components
• Context vector for each term– Values in {-1, 0, 1}
• Vector Space representation of a term (t)• Vector Space representation of a document (d)• Vector Space representation of a user profile (pu)
Indexing Technique
• Random Indexing-based model• Weighted Random Indexing-based model• Semantic Vector-based model• Weighted Semantic Vector-based model
Experimental Evaluation
Conclusions
• First prototype with naive weighting scheme is comparable to other content based filtering techniques like Bayesian classifier
• Other complex weighting schemes should perform better
• User profiles may be studied based on Linked Data rather than keyword based user profiles