Date post: | 24-Feb-2016 |
Category: | Documents |
View: | 21 times |
Download: | 0 times |
Improving relevance prediction by addressing biases and sparsity in web search click data
Improving relevance prediction by addressing biases and sparsity in web search click data
Qi Guo, Dmitry Lagun, Denis Savenkov, Qiaoling Liu[qguo3,dlagun,denis.savenkov,qiaoling.liu]@emory.eduMathematics & Computer Science, Emory University
Relevance Prediction Challenge
Web Search Click Data
Relevance prediction problemsPosition-biasPerception-biasQuery-biasSession-biasSparsity
Relevance prediction problems: position-biasCTR is a good indicator of document relevancesearch results are not independentdifferent positions different attention[Joachims+07]
Normal PositionPercentage
Reversed ImpressionPercentageRelevance prediction problems: perception-biasUser decides to click or to skip based on snippetsPerceived relevance may be inconsistent with intrinsic relevance
Relevance prediction problems: query-biasqueries are differentCtr for difficult queries might not be trustworthyFor infrequent queries we might not have enough dataNavigational vs informationalDifferent queries different time to get the answerQueries:P versus NPhow to get rid of acneWhat is the capital of Hondurasgrand hyatt seattle zip code
Why am I still singlewhy is hemp illegalRelevance prediction problems: session-biasUsers are differentQuery Intent30s dwell time might not indicate relevance for some types of users [Buscher et al. 2012]
Relevance prediction problems: sparsity1 show 1 clicks means relevant document?
What about 1 show 0 clicks, non-relevant?
For tail queries (non-frequent doc-query-region) we might not have enough clicks/shows to make robust relevance predictionClick ModelsUser browsing probability modelsDBN, CCM, UBM, DCM, SUM, PCC
Dont work well for infrequent queriesHard to incorporate different kind of features
Our approachClick Models are goodBut we have different types of information we want to combine in our modelLets use Machine Learning
ML algorithms:AUCRankGradient Boosted Decision Trees (pGBRT implementation) regression problem
DatasetYandex Relevance Prediction Challenge data:Unique queries: 30,717,251Unique urls: 117,093,258Sessions: 43,977,8594 Regions:Probably: Russia, Ukraine, Belarus & Kazakhstan
Quality measureAUC - Area Under Curve
Public and hidden test subsetsHidden subset labels arent currently availableFeatures: position-biasper position CTRClick-SkipAbove and similar behavior patternsDBN (Dynamic Bayesian Network)Corrected shows: shows with clicks on the current position or below (cascade hypothesis)
Features: perception-biasPost-click behaviorAverage/median/min/max/std dwell-time
Sat[Dissat] ctr (clicks with dwell >[average query dwell time# clicks before click on the given urlThe only click in query/showsUrl dwell/total dwellFeatures: session-biasUrl features normalization>average session dwell time
#clicks in session
#longest clicks in session/clicks
dwell/session durationFeatures: sparsityPseudo-counts for sparsityPrior information: original ranking (average show position; shows on i-th pos / shows)Back-offs (more data less precise): url-query-regionurl-queryurl-regionurlquery-regionquery
Parameter tuning
Later experiments:5-fold CVTree height h=3Iterations: ~250Learning rate: 0.1
Results (5-fold CV)
Baselines:Original ranking (average show position): 0.6126Ctr: 0.6212
Models:AUC-Rank: 0.6337AUC-Rank + Regression: 0.6495Gradient Boosted Regression Trees: 0.6574Results (5-fold CV)
session and perception-bias features are the most important relevance signalsQuery-bias features dont work well by itself but provide important information to other feature groupsResults (5-fold CV)
query-url level features are the best trade-off between precision and sparsity
region-url features have both problems: sparse and not precise
Feature importance
ConclusionsSparsity: Back-off strategy to address data sparsity = +3.1% AUC improvementPerception-bias: dwell-time is the most important relevance signal (who wouldve guessed )Session-bias: session-level normalization helps to improve relevance prediction qualityQuery-bias: query-level information gives an important additional information that helps predict relevancePosition-bias features are useful
THANK YOUThanks to the organizers for such an interesting challenge & open dataset!
Thank you for listening!
P.S. Do not overfit
Click here to load reader