Personalization in Information Retrieval, Extraction and AccessWorkshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008
Vasudeva Varma
www.iiit.ac.in/~vasu
2
Search Engine Heat is On!
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
2
� Applications of Search Technologies
�Web search
�Product search
�Service search
�Domain Search
� Already a BIG Market
� HUGE Opportunity
5/30/2008
3
Agenda
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
3
� Evolution of Search Engines
� Information Retrieval Vs. Extraction Vs. Access
� Personalization in IR, IE and IA
� Applications in Personalized IA
� Conclusions
4
Evolution of Search Engines
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
4
� Crawling and Indexing
� Topic directories
� Clustering and Classification
� Hyperlink analysis
� Resource discovery and vertical portals
� Semantic Web
� ???
5/30/2008
5
Current IR engines fail – why?
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
5
� Wide variation in retrieval results � User topic
� Retrieval system
� Different approaches work for different systems.
� No way to determine which approach will work for a particular query.
Solution:
� Deeper analysis of the content and Query
5/30/2008
6
Motivation for Deeper Analysis
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
6
� Texts are one of the major sources of
information and knowledge.
However, they are not transparent.
They have to be systematically integrated with
the other sources like data bases, numerical data,
etc.
NLP/IR/IE for better analysis
IA for better presentation5/30/2008
7
Agenda
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
7
� Evolution of Search Engines
� Information Retrieval Vs. Extraction Vs. Access
� Personalization in IR, IE and IA
� Applications in Personalized IA
� Conclusions
8
IR vs. IE vs. IA
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
8
� To search and retrieve documents in response to queries for information
Vs.
� To extract information that fits pre-defined database schemas or templates, specifying the output formats
Vs.
� To make the required information accessible to the user in theirchoice of language, mode, level of detail and format
5/30/2008
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H9
Collection of Texts
IR System
Characterization of Texts
Queries
5/30/2008
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H10
Collection of Texts
IR System
Characterization of Texts
Queries
Interpretation
Knowledge
5/30/2008
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H11
Collection of Texts
Passage
IR System
Characterization of Texts
Queries
Interpretation
Knowledge
5/30/2008
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H12
Collection of Texts
Passage
IR System
Characterization of Texts
Queries
Interpretation
Knowledge
IE System
Texts Templates
Structures
of
Sentences
NLP
5/30/2008
I
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H13
Passage
IR System
Interpretation
Knowledge
IE System
5/30/2008
Machine
Translation
Summarization
Visualization
Tools
Information Access
Technologies
Snippet
Generation
NL Generation
14
Agenda
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
14
� Evolution of Search Engines
� Information Retrieval Vs. Extraction Vs. Access
� Personalization in IR, IE and IA
� Applications in Personalized IA
� Conclusions
15
Limitations of Current IR Systems15
� All users get same results for a given query –independent of:
� Previous search history
� Current Search Context
� Treat all users the same
� Does one size fits all?
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
16
Personalized Web Search16
� Automatic adjustment of information content, structure, and presentation tailored to an individual user.
� Characteristics: Age, Gender, Special Interest Groups, Topic
� Personalize Search Results using � Personal content
� Past Activities (long term and short term)
� Variations:� Explicit or Implicit profile setup
� Explicit or Implicit relevance feedback
� Client side or server side storage of information (privacy implications)
� User control over amount of personalization
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
17
Overview of Personalized Search
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
17
Typically a 3 step process:
1. Obtain results (n>>10)
2. Computer Similarity (results, User)
3. Re-rank the results
18 5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
18
19 5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
19
20
Techniques
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
20
� Co-active Techniques
� Pro-active Techniques
� Collaborative Filtering
� User Profile based Result Pruning
� User Profile based Query Expansion
5/30/2008
21
Problem Description
� Personalized Search - Issues
� What to use to Personalize?
� How to Personalize?
� When not to Personalize?
� How to know Personalization helped?
22
Problem Description
� We focus on the issue How to Personalize?
� Problem Statement
� How to learn to personalize for future searches using past search history
� How to model and represent past search contexts
� How to use it to improve search results
23
Solution - Outline
� Model and Represent past user feedback – Learning user profile� Use implicit feedback
� Long term learning
� User contexts – triples � {user,query,{relevant documents}}
� Improve Search Results – Reranking� Get Initial Search results
� Take top few and rescore using user profile and rearrange
24
Contributions
� I Search : A suite of approaches for Personalized Web Search
� Proposed Personalized search approaches
� Baseline
� Basic Retrieval methods
� Automatic Evaluation
� Analysis of Query Log
25
Review of Personalized Search
Personalized Search
Query logs Machine learning Language modeling Community based Others
26
I Search : A suite of Techniques for
Personalized IR
� Suite of Approaches???
� Statistical Language modeling based approaches
� Simple N-gram based methods
� Noisy Channel Model based method
� Machine learning based approach
� Ranking SVM based method
� Personalization without relevance feedback
� Simple N-gram based method
27
Statistical Language Modeling based Approaches:Overview
� From user contexts, capture statistical properties of texts
� Use the same to improve search results
� Different Contexts� Unigram and Bigrams
� Simple N-gram based approaches
� Relationship between query and document words
� Noisy Channel based approach
28
Simple N-gram based approaches
� N-gram : general term for words
� 1-gram : unigram, 2-gram : bigram
� Capture statistical properties of text
� Single words (Unigrams)
� Two adjacent words (Bigrams)
29
Learning user profile
Given Past search history
Hu = {(q1, rf1), (q2, rf2), …, (qn, rfn)}
� rfall = contentation of all rf
� For each unigram wi
� User profile
30
Sample user profile
31
Reranking
� In general LM for IR
� Our Approach
32
Noisy Channel based Approach
� Documents and Queries different information spaces
� Queries – short, concise
� Documents – more descriptive
� Most methods to retrieval or personalized web search do not model this
� Capture relationship between query and document words
33
Machine Learning based Approaches:Introduction
� Most machine learning for IR - Binary classification problem – “relevant” and “non-relevant”
� Click through data � Click is not an absolute relevance but relative relevance
� i.e., assuming clicked – relevant, un clicked - irrelevant is wrong.
� Clicks – biased
� Partial relative relevance - Clicked documents are more relevant than the un clicked documents.
34
Personalized Search without Relevance Feedback:Introduction
� Can personalized be done without relevance feedback about which documents are relevant
� How much informative are the queries posed by users
� Is information contained in the queries enough to personalize?
35
Approach
� Past queries of the user available
� Make effective use of past queries
� Simple N-gram based approach
36
Experiment Results
� Language Modeling – Best Results! � Interesting framework Personalized Search
� Simple N-gram based approaches also worked well
� Noisy Channel model worked best� Extracting Synthetic Queries helped
� Different Training schemes� IBM Model1 Vs GIZA++� Snippet Vs Document
� Machine Learning – competitive results� Different Features and weights
� Without Relevance Feedback – Very encouraging results� Simple Approach worked well
� Sparsity – Query log was useful
37
Agenda
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
37
� Evolution of Search Engines
� Information Retrieval Vs. Extraction Vs. Access
� Personalization in IR, IE and IA
� Applications in Personalized IA
� Conclusions
Personalized Search Engine for Mobile Phones
Personalized Summarization (for Mobile Devices)
38 (C) Vasudeva Varma, IIIT Hyderabad, India38
“Personalized” Search Engine for mobile devices
� To develop a “personalized” Search Engine for mobile devices that will produce more relevant results based on the queryand the “context”
� What we mean by “Personalized” search?
� user will be able to configure the search interfaces (Explicit feedback)
� System will observe user behavior and customize itself to suit user’s needs (Implicit feedback)
� What we mean by “Context”?
� User, time, location, …
Goal is to make Search accessible on Nokia mobile devices and make use of
the mobile aspects for personalization.
39 (C) Vasudeva Varma, IIIT Hyderabad, India39
Scope of the Application
Client Side Server Side
40 (C) Vasudeva Varma, IIIT Hyderabad, India40
Problem Re-Definition
� Dynamic user behavior tracking� An observer that keeps track of all “relevant” user actions
� Client module
� Analysis of user actions� Interpret the user actions to derive user interests (categories of interests)
so that more relevant results are displayed
� Construction of user profile implicitly� Implicit Supervised learning
� Personalization� Based on Query
� Based on User Profile
� Based on other parameters such as time, location
41 (C) Vasudeva Varma, IIIT Hyderabad, India41
Solution Overview
42 (C) Vasudeva Varma, IIIT Hyderabad, India42
Personalized Summarization: Motivation
� The success that search engine providers have found on the PC have failed to translate to the mobile phone. why?
� Because trying to force a PC-based search experience inside a mobile device falls short on a key area of usability
� Search queries typically return hundreds of potential hits.
� Making sense of such output is difficult.
� The results may or may not be of user interest.
� We are looking for a faster and easier way to access precise information on our mobile devices.
43 (C) Vasudeva Varma, IIIT Hyderabad, India43
Challenges
� Can we offer users a more simple, friendly and intuitive experience?
� We are looking forward to provide more information with less payload in form of a summary which will take care of� context
� history
� preferences
� device capabilities
� social network
44 (C) Vasudeva Varma, IIIT Hyderabad, India
44
System Model
Search Engine
45
Summary
5/30/2008IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
45
� Current Search Engines are inadequate and current know-how is only the tip of an ice-berg
� IR, IE and IA areas have enjoyed huge commercial success and have a huge growth potential
� Personalization is perhaps the next big wave
� Various personalization techniques are available -yet this is a very fertile research field
� The two personalization application shown are just examples of many possibilities.
Vasudeva Varma, IIIT Hyderabad
[email protected] or www.iiit.ac.in/~vasu
Thank You – Questions?
5/30/2008
46
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H