Improving Similarity Measures for Short Segments of Text
Scott Wen-tau Yih & Chris MeekMicrosoft Research
Query Suggestion
How similar are they?
mariners vs. seattle marinersmariners vs. 1st mariner bank
querymariners
Keyword Expansion for Online Ads
Chocolate Cigarettes
Chocolate candy
Chocolate cigars
Nostalgic candy
Novelty candy
Candy cigarettes
Old fashioned candy
How similar are they?
chocolate cigarettes vs. cigaretteschocolate cigarettes vs. chocolate
cigarschocolate cigarettes vs. old
fashioned candy
Measuring SimilarityGoal: create a similarity function
fsim: (String1,String2) R
Rank suggestions Fix String1 as q; vary String2 as s1 , s2 , , sk
Whether the function is symmetric is not important
For query suggestion – fsim(q,s)
fsim(“mariners”, “seattle mariners”) = 0.9
fsim(“mariners”, “1st mariner bank”) = 0.6
Enabling Useful ApplicationsWeb search
Ranking query suggestionsSegmenting web sessions using query logs
Online advertisingSuggesting alternative keywords to advertisersMatching similar keywords to show ads
Document writingProviding alternative phrasingCorrecting spelling errors
ChallengesShort text segments may not overlap
“Microsoft Research” vs. “MSR” 0 cosine score
Ambiguous terms“Bill Gates” vs. “Utility Bill” 0.5 cosine score“taxi runway” vs. “taxi” 0.7 cosine score
Text segments may rarely co-occur in corpus
“Hyatt Vancouver” vs. “Haytt Vancover” 1 pageLonger query Fewer pages
Our ContributionsWeb-relevance similarity measure
Represent the input text segments as real-valued term vectors using Web documentsImprove term weighting scheme based on relevant keyword extraction
Learning similarity measureFit user preference for the application betterCompare learning similarity function vs. learning ranking function
OutlineIntroduction
Problem, Applications, Challenges
Our MethodsWeb-relevance similarity functionCombine similarity measures using learning
Learning similarity functionLearning ranking function
Experiments on query suggestion
Web-relevance Similarity MeasureQuery expansion of x using a search
engineLet Dn(x) be the set of top n documents
Build a term vector vi for each document di Dn(x)
Elements are scores representing the relevancy of the words in document di
C(x) = 1n vi / ||vi|| (L2-normalized, centroid)QE(x) = C(x) / ||C(x)|| (L2-normalized)
Similarity score is simply the inner product
fsim (q,s) = QE(q) QE(s)
Web-kernel SimilarityRelevancy = TFIDF [Sahami&Heilman ‘06]
Why TFIDF?High TF: important or relevant to the documentHigh DF: stopwords or words in template blocks
Crude estimate of the importance of the wordCan we do better than
TFIDF?
Web-relevance SimilarityRelevancy = Prob(relevance | wj,di)
Keyword extraction can judge the importance of the words more accurately! [Yih et al. WWW-06] Assign relevancy scores (probabilities) to words/phrasesMachine Learning model learned by logistic regressionUse more than 10 categories of features
Query-log frequency High-DF words may be popular queries
The position of the word in the documentThe format, hyperlink, etc.
Learning SimilaritySimilarity measures should depend on application
q=“Seattle Mariners” s1=“Seattle” s2=“Seattle Mariners Ticket”
Let human subjects decide what’s similar
Parametric similarity function fsim(q,s|w)Learn the parameter (weights) from dataUse Machine Learning to combine multiple base similarity measures
Base Similarity MeasuresSurface matching methods
Suppose Q and S are the sets of words in a given pair of query q and suggestion s
Matching |QS|
Dice 2|QS|/(|Q|+|S|)
Jaccard |QS|/|QS|
Overlap |QS|/min(|Q|,|S|)
Cosine |QS|/sqrt(|Q|×|S|)
Corpus-based methodsWeb-relevance, Web-kernel, KL-divergence
Learning Similarity FunctionData – pairs of query and suggestion (qi,sj)
Label: Relevance judgment (rel=1 or rel=0)
Features: Scores on (qi,sj) provided by multiple base similarity measures
We combine them using logistic regressionz = w1Cosine(q,s) + w2Dice(q,s) + w3Matching(q,s) +
w4Web-relevance(q,s) + w5KL-divergence(q,s) +
fsim(q,s|w) = Prob(rel|q,s;w) = exp(z)/(1+exp(z))
Learning Ranking FunctionWe compare suggestions sj , sk to the same query q
Data – tuples of a query q and suggestions sj, sk
Label: [sim(q,sj) > sim(q,sk)] or [sim(q,sj) < sim(q,sk)]
Features: Scores on pairs (q,sj) and (q,sk) provided by multiple base similarity measures
Learn a probabilistic model using logistic regression
Prob([sim(q,sj) > sim(q,sk)] | q,sj,sk;w)
ExperimentsData: Query suggestion dataset [Metzler et al. ’07]
|Q| = 122, |(Q,S)| = 4852; {Ex,Good} vs. {Fair,Bad}
Results10-fold cross-validationEvaluation metrics: AUC and Precision@k
Query Suggestion Label
shell oil credit card shell gas cards Excellent
shell oil credit card texaco credit card Fair
tarrant county college
fresno city college Bad
tarrant county college
dallas county schools
Good
AUC Scores
1
2
3
4
5
6
7
8
9
10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.7390.735
0.703
0.691000000000001
0.664000000000001
0.6270.6270.626
0.6170.606
Precision@3
1
2
3
4
5
6
7
8
9
10
0 0.1 0.2 0.3 0.4 0.5 0.6
0.5690.556
0.5080.483
0.436
0.456
0.4560.456
0.4440.389
ConclusionsWeb-relevance
New term-weighting scheme from keyword extractionOutperform existing methods on query suggestion
Learning similarityFit the application – better suggestion rankingLearning similarity function vs. learning ranking function
Future workExperiment with alternative combination methodsExplore other probabilistic models for similarityApply our similarity measures to different tasks