Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | thalia-verdon |
View: | 217 times |
Download: | 0 times |
Modelling Relevance and User Behaviour in Sponsored Search using
Click-Data
Adarsh Prasad, IIT DelhiAdvisors: Dinesh Govindaraj
SVN Vishwanathan*Group: Revenue and Relevance
*-Visiting Researcher from Purdue
Overview
• Click-Data seems to be the perfect source of information when deciding which Ads to show in answer to a query. It can be thought as the result of users voting in favour of the documents they find interesting.
• This information can be fed into the ranker, to tune search parameters or even use as training points as for the ranker.
• The aim of the project is to develop a model which takes in Click-Data and generates output in the form of constraints or updated ranking score as input to the ranker.
2
• Quality of training points is of critical importance for learning a ranking function
• Currently, labeled data collected using human judges. Human-labeling is time-consuming and labor-intensive.
• Need to ensure “temporal relevance” of Ads i.e. Something relevant today might not be relevant 6 months later, therefore labeling must be repeated and there is a need for automation of labeling process
Motivation
Main Difficulty – Presentation Bias•Results at lower positions are less likely to be clicked even if they are relevant.(Position)•Clicks depend on other Ads being shown.(Externalities)
[1] Oliver Chapelle et al. A Dynamic Bayesian Click Model for Web Search Ranking
Example[1]
Query: myspaceURL = www.myspace.com Market = U.K.
Ranking 1Pos 1: uk.myspace.com: ctr = 0.97Pos 2: www.myspace.com: ctr = 0.11
Ranking 2: Pos 1 : www.myspace.com : ctr = 0.97
3
Procedure
• Use of Click Data as target : Useful for markets with few editorial Judgments.
• Train on pairwise preferences: Two Sets of preferences: PE from editorial judgments and PC coming from click modeling.
Minimize:
For learning a web search function, clicks can be used as a target[2] or as a feature[3]
Target
1. Deriving Preference Relations on the basis of click-pattern and feeding them as constraints to ranker (Rocky-Road)• Position and Order-of-Click
based Constraints[4]
• Aggregate Constraints
Feature
1. Sample Clicked Ads and label them as relevant.
2. Types of Sampling:• Random• Position based Weighted : User Clicking
ml-4 Ad stronger signal of relevance as compared to user clicking ml-1
3. Feed them to the Binary Classifier
[2] Joachims et al. Optimizing Search Engines using Clickthrough Data[3] Agichtein et al. Improving web search ranking via incorporating User Behaviour[4] Joachims et al. Accurately interpreting ClickThrough Data as Implicit Feedback
4
Results
5
EXACTMATCH BROADMATCH PHRASEMATCH SMARTMATCH
Sampling +0.39% +1.02%
Position and Order Constraints
+1.22% +5.93% +4.15% +0.38%
Aggregate Constraints
+0.2% +5.17% +0.77% +0.5%
SAME SUPERSET DISJOINT
Sampling +5.72% +4.22%
Position and Order Constraints
+3.1% +2.28%
Aggregate Constraints
+7.4% +5.28%
-6.28%
-3.9%
-11.3%
Fisher Score =
-0.06% -0.5%
Log Loss (Label Based)Sampling +0.001%
Position and Order Constraints
+3.07%
Aggregate Constraints
+1.75%
Weighted LL
Background on Click Models• Use CTR (click-through rate) data.• Pr(click) = Pr(examination) x Pr(click | examination)
• Need user browsing models to estimate Pr(examination)
Relevance
6
Notation• Φ(i) : result at position i
• Examination event:
• Click event:
otherwise 0,
(i)on clickeduser theif ,1 iC
otherwise 0,
(i) examineduser theif ,1 iE
7
Examination HypothesisRichardson et al, WWW 2007: Pr(Ci = 1) = Pr(Ei = 1) Pr(Ci = 1 | Ei = 1)
• αi : position bias• Depends solely on position.• Can be estimated by looking at CTR of the same result in different
positions.
8
Examination depends on prior clicks
• Cascade model• Dependent click model (DCM)• User browsing model (UBM) [Dupret & Piwowarski, SIGIR
2008]• More general and more accurate than Cascade, DCM.• Conditions Pr(examination) on closest prior click.
• Bayesian browsing model (BBM) [Liu et al, KDD 2009]• Same user behavior model as UBM.• Uses Bayesian paradigm for relevance.
10
• Use position of closest prior click to predict Pr(examination). Pr(Ei = 1 | C1:i-1) = αi β i,p(i)
Pr(Ci = 1 | C1:i-1) = Pr(Ei = 1 | C1:i-1) Pr(Ci = 1 | Ei = 1)
User browsing model (UBM)
11
position bias
p(i) = position of closest prior click
Prior clicks don’t affect relevance.
Other Related Work• Examination depends on prior clicks and prior relevance• Click chain model (CCM)• General click model (GCM)
• Post-click models• Dynamic Bayesian model• Session utility model
12
User Browsing in Sponsored Search
13
• Is user browsing in sponsored search similar to browsing in Web Search?? • Generally, the assumption in organic search is that users examine and click in a
linear top-to-bottom fashion.• We observed that for sponsored search where the number of returned results is
few, a fair share (~ 30%) of users click out of order. • Users behaving in a non-linear fashion is a strong signal, which may contain
important information.• Combining position and temporal behavior of user.
The statistic(x) that has been counted is the difference between the positions of temporal clicks.
Example:if the user clicks on ml1 and then ml2 then x = -1 if ml2 and then ml1 then x=1 and so on.
A New Model• Allow users to move in a non-linear fashion• Also, incorporate the notion of externalities, i.e. perceived
relevance changes with other clicks.
14
For learning our parameters, we can use EM Algorithm.(1) In E step, we estimate our
hidden parameters by a forward-backward algorithm.
(2) In M step- We have closed form solutions to maximize the expected log-likelihood.