Home >Documents >Improving relevance prediction by addressing biases and sparsity in web search click data

Improving relevance prediction by addressing biases and sparsity in web search click data

Date post:24-Feb-2016
View:21 times
Download:0 times
Share this document with a friend
Improving relevance prediction by addressing biases and sparsity in web search click data. Qi Guo , Dmitry Lagun , Denis Savenkov , Qiaoling Liu [qguo3 ,dlagun,denis.savenkov, qiaoling.liu ] @ emory.edu Mathematics & Computer Science, Emory University. Relevance Prediction Challenge. - PowerPoint PPT Presentation

Improving relevance prediction by addressing biases and sparsity in web search click data

Improving relevance prediction by addressing biases and sparsity in web search click data

Qi Guo, Dmitry Lagun, Denis Savenkov, Qiaoling Liu[qguo3,dlagun,denis.savenkov,qiaoling.liu]@emory.eduMathematics & Computer Science, Emory University

Relevance Prediction Challenge

Web Search Click Data

Relevance prediction problemsPosition-biasPerception-biasQuery-biasSession-biasSparsity

Relevance prediction problems: position-biasCTR is a good indicator of document relevancesearch results are not independentdifferent positions different attention[Joachims+07]

Normal PositionPercentage

Reversed ImpressionPercentageRelevance prediction problems: perception-biasUser decides to click or to skip based on snippetsPerceived relevance may be inconsistent with intrinsic relevance

Relevance prediction problems: query-biasqueries are differentCtr for difficult queries might not be trustworthyFor infrequent queries we might not have enough dataNavigational vs informationalDifferent queries different time to get the answerQueries:P versus NPhow to get rid of acneWhat is the capital of Hondurasgrand hyatt seattle zip code

Why am I still singlewhy is hemp illegalRelevance prediction problems: session-biasUsers are differentQuery Intent30s dwell time might not indicate relevance for some types of users [Buscher et al. 2012]

Relevance prediction problems: sparsity1 show 1 clicks means relevant document?

What about 1 show 0 clicks, non-relevant?

For tail queries (non-frequent doc-query-region) we might not have enough clicks/shows to make robust relevance predictionClick ModelsUser browsing probability modelsDBN, CCM, UBM, DCM, SUM, PCC

Dont work well for infrequent queriesHard to incorporate different kind of features

Our approachClick Models are goodBut we have different types of information we want to combine in our modelLets use Machine Learning

ML algorithms:AUCRankGradient Boosted Decision Trees (pGBRT implementation) regression problem

DatasetYandex Relevance Prediction Challenge data:Unique queries: 30,717,251Unique urls: 117,093,258Sessions: 43,977,8594 Regions:Probably: Russia, Ukraine, Belarus & Kazakhstan

Quality measureAUC - Area Under Curve

Public and hidden test subsetsHidden subset labels arent currently availableFeatures: position-biasper position CTRClick-SkipAbove and similar behavior patternsDBN (Dynamic Bayesian Network)Corrected shows: shows with clicks on the current position or below (cascade hypothesis)

Features: perception-biasPost-click behaviorAverage/median/min/max/std dwell-time

Sat[Dissat] ctr (clicks with dwell >[average query dwell time# clicks before click on the given urlThe only click in query/showsUrl dwell/total dwellFeatures: session-biasUrl features normalization>average session dwell time

#clicks in session

#longest clicks in session/clicks

dwell/session durationFeatures: sparsityPseudo-counts for sparsityPrior information: original ranking (average show position; shows on i-th pos / shows)Back-offs (more data less precise): url-query-regionurl-queryurl-regionurlquery-regionquery

Parameter tuning

Later experiments:5-fold CVTree height h=3Iterations: ~250Learning rate: 0.1

Results (5-fold CV)

Baselines:Original ranking (average show position): 0.6126Ctr: 0.6212

Models:AUC-Rank: 0.6337AUC-Rank + Regression: 0.6495Gradient Boosted Regression Trees: 0.6574Results (5-fold CV)

session and perception-bias features are the most important relevance signalsQuery-bias features dont work well by itself but provide important information to other feature groupsResults (5-fold CV)

query-url level features are the best trade-off between precision and sparsity

region-url features have both problems: sparse and not precise

Feature importance

ConclusionsSparsity: Back-off strategy to address data sparsity = +3.1% AUC improvementPerception-bias: dwell-time is the most important relevance signal (who wouldve guessed )Session-bias: session-level normalization helps to improve relevance prediction qualityQuery-bias: query-level information gives an important additional information that helps predict relevancePosition-bias features are useful

THANK YOUThanks to the organizers for such an interesting challenge & open dataset!

Thank you for listening!

P.S. Do not overfit

Popular Tags:

Click here to load reader

Reader Image
Embed Size (px)