Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | vincent-gallagher |
View: | 214 times |
Download: | 0 times |
Microsoft Research
Online Search and Advertising, Future and Present
Chris BurgesMicrosoft Research
Saturday, Dec 13, 2008
Text Mining, Search and Navigation 1
Microsoft Research
Contents
• Search and Advertising – some ideas• Where are we headed?• How to begin?
• Some new results on ranking: we can directly learn Information Retrieval measures
• Internet security and RSA: why worry?
Text Mining, Search and Navigation 2
Microsoft Research
Why Search Works…
• Traditional: print, TV, radio, billboards,…• Only very broadly targeted to
demographics (some exceptions)• Search is monetarily successful because
advertising is more precisely targeted• The Google model is giving and will
continue to give traditional channels a run for their money
Text Mining, Search and Navigation 4
Microsoft Research
Key Points
• The online experience will be more deeply engaging.
Text Mining, Search and Navigation 5
Microsoft Research
What’s wrong with what we do now?
• Nothing, but… ten blue links + ads, ten years from now?
• Ads are ‘tacked on’ to the user experience.• Paid Search / Contextual / Banner – all are
still largely impersonal.• But, Behavioral Targeting…
Text Mining, Search and Navigation 6
Microsoft Research
How might ads be targeted better?• I just bought a car – don’t show me more
ads for cars• I just bought a house – show me ads for
furniture• I like band X, but not Y• In general, build a model of what I’m in the
market for• Per-user pricing, availability• User-driven asks (show me all ads for Z)
Text Mining, Search and Navigation 7
Microsoft Research
User Models
• User models can be used to enrich the
online experience, not just advertising.- Automated teaching– Need a model of the user’s understanding.
• Find other users with similar interests• Tailor news presentation to user’s interests
Text Mining, Search and Navigation 8
Microsoft Research
Key Points• The online experience will be more deeply
engaging.• We will need rich state models of users: likes,
dislikes, ± interests, knowledge
Text Mining, Search and Navigation 9
Microsoft Research
Search: Somewhere in the Near Future
Text Mining, Search and Navigation 11
Query
Structured Data:
Distribution over Intents
Indexed Web Data
Structured Data: Diversity;
Popular Pages; Aid Transaction
Human Computer
Dialog
84% Info.12% Nav.4% Trans.
78% Comm.…
Display
Microsoft Research
How to get the information we need, to build good models for users?
Ask them!
Text Mining, Search and Navigation 13
Microsoft Research
Key Points• The online experience will be more deeply
engaging.• We will need rich state models of users: likes,
dislikes, ± interests, knowledge, and more.• Natural Language Processing will be key.
Text Mining, Search and Navigation 14
Microsoft Research
Search Applications: And,Data Changes Everything
Text Mining, Search and Navigation 15
• Example: AskMSR (Brill, Dumais, Banko, ACL 2002)• Commonly used resources for QA:
• Part-of-speech tagger, parser, named-entity extractor, WordNet or other knowledge bases, passage or sentence retrieval, abduction, etc.
• AskMSR doesn’t use any of them • Instead, AskMSR focuses on data:
• There is a lot of data on the web – use it• Redundancy is a resource to be exploited
• Data-driven QA: simple techniques, lots of data
Microsoft ResearchText Mining, Search and Navigation 17
Data Changes Everything
Banko and Brill, Mitigating…, ICHLTR 2001
Microsoft ResearchText Mining, Search and Navigation 18
Data Changes Everything
Banko and Brill, Scaling…, 2001
Microsoft Research
Key Points• The online experience will be more deeply
engaging.• We will need rich state models of users: likes,
dislikes, ± interests, knowledge, and more.• Natural Language Processing will be key.• “Search” can be the engine under the hood for
many different applications.• It’s better to use tons of data and simple
models, versus smaller datasets and complex models.
Text Mining, Search and Navigation 19
Microsoft Research
Key Points• The online experience will be more deeply
engaging.• We will need rich state models of users: likes,
dislikes, ± interests, knowledge, and more.• Natural Language Processing will be key.• “Search” can be the engine under the hood for
many different applications.• It’s better to use tons of data and simple
models, versus smaller datasets and complex models.
Text Mining, Search and Navigation 20
Microsoft Research
How to proceed?
• Don’t know. But: Sam, a Search Chatbot.– Provide an engaging chat experience– Use Search to show images, urls, videos,…– Will build persistent user world models– Will have its own world model– Can show precisely targeted ads– Will leverage social networks
Text Mining, Search and Navigation 21
Microsoft Research
The Eliza Effect
• Eliza: J. Weizenbaum,1966 (!)• Demonstrated that extremely simple
techniques can result in compelling dialog (sometimes, for some users)
• Users tend to anthropomorphize computer behavior
• This is gives us an advantage
Text Mining, Search and Navigation 22
Microsoft Research
Our Prime Directive in Building Sam:
Text Mining, Search and Navigation 23
Do as little supervision as possible.
Microsoft Research
Let the Data do the Work• anarchism category: anarchism• anarchism category: political ideologies• anarchism category: political philosophies• anarchism category: social philosophy• autism category: autism• autism category: pervasive developmental disorders• autism category: childhood psychiatric disorders• autism category: communication disorders• autism category: neurological disorders• albedo category: electromagnetic radiation• albedo category: climatology• albedo category: climate forcing• albedo category: scattering, absorption and radiative transfer (optics)• albedo category: radiometry• abu dhabi category: abu dhabi• abu dhabi category: capitals in asia• abu dhabi category: cities in the united arab emirates• abu dhabi category: coastal cities• a category: latin letters• a category: vowel letters
Text Mining, Search and Navigation 24
Robert Rounthwaite, TMSN
Microsoft Research
Using Category Graphs to Drive Dialog
– User: I like ferrets.– Ferret: category: animals people keep as pets– Animals people keep as pets: rabbits– Sam: Do you like rabbits, too?
Text Mining, Search and Navigation 25
Use ODP and Wikipedia hierarchies to construct graph
Microsoft Research
Use Category Graphs to Build Models
• Attach vector to each node, sparsely:– [like/dislike; interested/not; knows about; …]
• Each component has confidence level• Leverage graph structure to explore
Text Mining, Search and Navigation 26
“World model” for both user and for Sam
Microsoft Research
Other Useful Sources of Data
Text Mining, Search and Navigation 27
qi+2
qi+1
qi+3
qi
uj
uj+1
uj+2
qj qj+1 qj+2
Q. Mei, D. Zhou, K. Church, Query Suggestion Using Hitting Time, CIKM 2008
Microsoft Research
Temporal Querying Behavior
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
-300 -200 -100 0 100 200 300
Lif
t in
Pro
babi
lity
Days Since Query for "interview"
resume moving
M. Richardson, ACM Transactions on the Web 2008
28Text Mining, Search and Navigation
Microsoft Research
We Are Not Alone
• http://quizbot.trueknowledge.com/• http://www.jabberwacky.com/
Text Mining, Search and Navigation 29
Jab: Do you like computers? Me: Yes, but only when they are turned on. Jab: I like ferraris, do you like them also? Me: Not really. Why do you like them? Jab: Yes. Me: Why do you like them? Jab: No, I like going to mars.
Microsoft Research
One Possible Sentence Generator
• Inputs:– Sentiment– Distribution over topics under discussion– Features from recent sentences– Sentence or phrase database (with statistics)– Distributions over users likes / interests, etc.– Close or popular nodes where bot lacks
knowledge of user– Topic priors
• Output: ranked sentences
Text Mining, Search and Navigation 30
Microsoft Research
New Challenges for Machine Learning• How can we teach a chatbot to talk?
– “Good / bad response” buttons: reinforcement learning?
– ESP-like games for labeling for learning to rank sentences?
– Build natural sentences from phrases?• How can we learn effective user models?
– Combine from multiple users to form good priors– Use active learning during chat to reduce
uncertainty in the user’s model
Text Mining, Search and Navigation 31
Microsoft Research
Demo
Text Mining, Search and Navigation 32
Joint work with Scott Imig, Silviu Cucerzan
S. Cucerzan, Large Scale Named Entity Disambiguation based on Wikipedia data, Proc. 2007 Joint Conference on EMNLP and CNLL
Microsoft Research
Empirical Optimality of -rank
Joint work with:– Pinar Donmez (CMU)– Krysta Svore (MSR)– Yisong Yue (Cornell)
Text Mining, Search and Navigation 34
Microsoft Research
Some IR Measures
• Precision: Recall: • Average Precision: Compute precision for each
positive, average over all positions • Mean Average Precision: Average AP over queries
• Mean Reciprocal Rank (TREC QA)
• Mean NDCG: , averaged over queries
Text Mining, Search and Navigation 35
Microsoft Research
IR Measures, cont.
Text Mining, Search and Navigation 36
These measures:
• Depend only on the labels and the sorted order of the documents
• Viewed as a function of the scores output by some model, are everywhere either flat or discontinuous
- SVM MAP: Yue et. al, SIGIR ’07- Tao Qin, Tie-Yan Liu, Hang Li, MSR Tech Report 164 (2008)
Microsoft Research
The RankNet Cost
Text Mining, Search and Navigation 38
Modeled posteriors:
Target posteriors:
Define
Cross entropy cost:
Model output probabilities using logistic:
Microsoft ResearchText Mining, Search and Navigation 39
o1-o2
C(o
1-o
2)
-5 -4 -3 -2 -1 0 1 2 3 4 50
1
2
3
4
5
6
P=0.0
P=0.5
P=1.0
Microsoft Research
RankNet Cost ~ Pairwise Cost
Text Mining, Search and Navigation 41
o1-o2
C(o
1-o
2)
-5 -4 -3 -2 -1 0 1 2 3 4 50
1
2
3
4
5
6
Microsoft ResearchText Mining, Search and Navigation 42
Pairwise Cost Revisited
Pairwise cost fine if no errors, but:
13 errors 11 errors
Microsoft ResearchText Mining, Search and Navigation
LambdaRankInstead of using a smooth approximation to the cost, andtaking derivatives, write down the derivatives directly.
Then use these derivatives to train a model usinggradient descent, as usual.
1s
2s
43
Microsoft Research
The Lambda Function
Text Mining, Search and Navigation 44
NDCG gain in swapping members of a pair of docs,multiplied by RankNet cost gradient as a smoother:
Let be the set of documents labeled higher (lower) than document :
Microsoft Research
Local Optimality
• Check the gradient vanishes at solution.• Get bound on probability that we’re not at a
local max, using one-sided Monte Carlo
Text Mining, Search and Navigation 46
P (We miss ascent direction despite k trials)
How large must k be for if ?
Answer:
Microsoft Research
Data Sets
• Artificial: 300 features, 50 urls/query, 10k/5k/10k train/valid/test split
• Web 1: 420 features, 26 urls/query, 10k/5k/10k split
• Web 2: 30k/5k/10k split
Text Mining, Search and Navigation 47
Microsoft Research
Which function to choose?
Text Mining, Search and Navigation 48
Lambda Gradient MAP ± SE MRR ± SE
RankNetWeightPairs 0.462 ± 0.0048 0.524 ± 0.0059
LocalGradient 0.435 ± 0.0048 0.515 ± 0.0060
LocalCost 0.427 ± 0.0049 0.512 ± 0.0059
SpringSmooth 0.424 ± 0.0048 0.498 ±0.0058
DiscreteGradient 0.401 ± 0.0049 0.471 ± 0.0059
• LocalGradient: finite element estimate of gradient, with margin• LocalCost: estimate local gradient using neighbors + weighted RankNet cost• SpringSmooth: smoother version of RankNetWeightPairs• DiscreteGradient: finite element estimate using optimal position
Microsoft Research
Sample Size Matters
Text Mining, Search and Navigation 50
10K Train 30K Train
Test Train Test Score Train Test Score
NDCG 0.416 NDCG 0.428
NDCG MAP 0.412 MAP 0.422
MRR 0.396 MRR 0.406
NDCG 0.442 NDCG 0.453
MAP MAP 0.439 MAP 0.456
MRR 0.429 MRR 0.449
NDCG 0.519 NDCG 0.532
MRR MAP 0.516 MAP 0.533
MRR 0.508 MRR 0.537
• Number of pairs drops by >2 for MRR and MAP• For MRR, number of samples drops much further
Microsoft Research
IR Measure Optimality - Conclusions
• Typically, IR practitioners would train models with small numbers of ‘smart’ features (~ BM25), and perform grid search
• However, adding many weak features improves performance
• We have shown that the LambdaRank gradients optimize three IR measures directly
Text Mining, Search and Navigation 51
Microsoft Research
Factoring biprimes as optimization
Text Mining, Search and Navigation 53
• The security of internet commerce (SSL, RSA) rests on a mathematical conjecture, namely, that factoring biprimes is combinatorically hard.
• Conjectures aren’t necessarily true. If this conjecture is false, there is no simple backup plan.
• The current fastest known factoring method is the general number field sieve. It is exponentially slow:
Microsoft ResearchText Mining, Search and Navigation 54
Circumstantial Evidence That Factoring is Not NP-hard
• There are very few known problems that quantum computers could solve exponentially faster than classical computers.
• Factoring is one of them: (Shor, 94). The “discrete logarithm” is one more.
• Much work since then to find a quantum algorithm that solves an NP complete problem has failed. A quantum computer must use domain knowledge (S. Aaronson, 2008)
• Searching a list of solutions is not exponential: (Grover, 96)
Microsoft Research
Is This The Best We Can Do?
Even exponential complexity, but with better N dependence, would be interesting.
Text Mining, Search and Navigation 55
Microsoft Research
RSA ChallengeRSA Number Decimal digits Binary digits Cash prize offered Factored on Factored by
RSA-100 100 330 April 1991 Arjen K. Lenstra
RSA-110 110 364 April 1992 Arjen K. Lenstra and M.S. Manasse
RSA-120 120 397 June 1993 T. Denny et al.
RSA-129 129 426 $100 USD April 1994 Arjen K. Lenstra et al.
RSA-130 130 430 April 10, 1996 Arjen K. Lenstra et al.
RSA-140 140 463 February 2, 1999 Herman J. J. te Riele et al.
RSA-150[3] 150 496 April 16, 2004 Kazumaro Aoki et al.
RSA-155 155 512 August 22, 1999 Herman J. J. te Riele et al.
RSA-160 160 530 April 1, 2003 Jens Franke et al., University of Bonn
RSA-170 170 563 open
RSA-576 174 576 $10,000 USD December 3, 2003 Jens Franke et al., University of Bonn
RSA-180 180 596 open
RSA-190 190 629 open
RSA-640 193 640 $20,000 USD November 2, 2005 Jens Franke et al., University of Bonn
RSA-200 200 663 May 9, 2005 Jens Franke et al., University of Bonn
RSA-210 210 696 open
RSA-704 212 704 $30,000 USD open
RSA-768 232 768 $50,000 USD open
RSA-896 270 896 $75,000 USD open
RSA-1024 309 1024 $100,000 USD open
RSA-1536 463 1536 $150,000 USD open
RSA-2048 617 2048 $200,000 USD open
(Wikipedia)56Text Mining, Search and Navigation
Microsoft Research
Represent the Problem in Binary1 0 1 1 0 1 1 1 𝑥2 𝑥1 1 1 𝑦1 1 1 𝑥2 𝑥1 1 𝑦1 𝑥2𝑦1 𝑥1𝑦1 𝑦1 1 𝑥2 𝑥1 1
𝑥1 + 𝑦1 = 1 𝑥2 + 𝑥1𝑦1 + 1 = 0+ 2𝑧1 1+ 𝑥2𝑦1 + 𝑥1 + 𝑧1 = 1+ 2𝑧2 𝑦1 + 𝑥2 + 𝑧2 = 1+ 2𝑧3 1+ 𝑧3 = 0+ 2𝑧4 𝑧4 = 1 Text Mining, Search and Navigation 57
Microsoft Research
First Trick: LinearizationReplace 𝑥𝑖𝑦𝑗 by 𝜂𝑖𝑗 everywhere, and add constraints 𝑥𝑖 + 𝑦𝑗 ≥ 2𝜂𝑖𝑗 𝑥𝑖 + 𝑦𝑗 − 1 ≤ 𝜂𝑖𝑗 0 ≤ 𝜂𝑖𝑗 ≤ 1
Key trick: for ሼ𝑥,𝑦,𝜂ሽ∈{0,1} 𝑥+ 𝑦− 1 ≤ 𝜂: 𝑥𝑦= 1 →𝜂 = 1 𝑥+ 𝑦≥ 2𝜂: 𝑥𝑦= 0 →𝜂 = 0 𝑥+ 𝑦≥ 2𝜂: 𝜂 = 1 →𝑥𝑦= 1 𝑥+ 𝑦− 1 ≤ 𝜂: 𝜂 = 0 →𝑥𝑦= 0 so in {0,1}, 𝜂 = 𝑥𝑦.
58Text Mining, Search and Navigation
Microsoft Research
Linearization, cont.
Theorem: Integer solutions of the {𝑥,𝑦,𝑧,𝜂} equations are in 1-1 correspondence with integer solutions of the {𝑥,𝑦,𝑧} equations. Given two corresponding solutions, the 𝑥,𝑦 and 𝑧 variables take the same values.
(Not immediately obvious: e.g. 𝑥1𝑦1 = 1, 𝑥2𝑦2 = 1 → 𝑥1𝑦2 = 1)
59Text Mining, Search and Navigation
Microsoft Research
Second Trick: Quantization
Reduce feasible region via Linear Programming.
Maximize 𝑐.𝑥, 𝑥∈𝐹, with 𝑐= ሺ1,0ሻ: max𝑥 𝑐∙𝑥= 0.95 →𝑥1 = 0
Text Mining, Search and Navigation 62
Microsoft Research
Maximize 𝑐.𝑥, 𝑥∈𝐹, with 𝑐= ሺ1,1ሻ: 𝑥1𝑚 = 1.0, 𝑥2𝑚 = 0.9, 𝑐∙𝑥= 1.9
Impose 𝑐∙𝑥≤ 1:
ሼ𝑥: 𝑐∙𝑥≤ 1ሽ∩𝐹
Text Mining, Search and Navigation 63
Microsoft Research
Quantization Without LPs
𝑥1 + 𝑥2 ≤ 1 𝑥2 + 𝑥3 ≤ 1 𝑥3 + 𝑥1 ≤ 1
→ 𝑥1 + 𝑥2 + 𝑥3 ≤ 1.5 → 𝑥1 + 𝑥2 + 𝑥3 ≤ 1
64Text Mining, Search and Navigation
Microsoft Research
More Simple Tricks
• Checking quantized versions of LP solutions is very fast (do the long division)
• Concentrate on the subspace.• Work with the smallest dimensional subspaces
that give new constraints.• Randomized algorithms?
5081 * 6007 = 30521567:1001111011001 * 1011101110111 = 1110100011011100011011111
65Text Mining, Search and Navigation
Microsoft Research
The Geometric View
Distance from origin to simplex: 1/ξ𝑛 Volume of ‘corner’: 1 𝑛!Τ Longest span inside unit cube: ξ𝑛
Lemma: Denote the binary variables corresponding to vertex 𝑣 ∈𝒰 by 𝑏𝑖 ∈ሼ0,1ሽ, 𝑖 =1,…,𝑛. Then the (un-normalized) normal to the hyperplane defined by the regular simplex which intersects all vertices which differ from 𝑣 by one (in one coordinate) is 𝑛 = 𝕝− 2𝑣 (where 𝕝 is the vector of all ones), where the sign has been chosen such that 𝑛 at 𝑣 points into 𝒰. The equation of the corresponding hyperplane is 𝑥∙𝑛 = 1− 𝑝 where 𝑝 is the number of ones in 𝑣, and the corresponding constraint (delimiting the region lying inside 𝒰 but not including 𝑣) is 𝑥∙𝑛 ≥ 1− 𝑝.
66Text Mining, Search and Navigation
Microsoft Research
Projections Lose Information
Text Mining, Search and Navigation 67
How fast can randomized projections in subspaces find the solution?
Microsoft Research
Conclusions
• Search (and advertising) are likely to become more ubiquitous and better targeted
• Ranking algorithms are a key tool, and we can directly optimize finicky IR measures
• RSA is probably safe as houses, but we should probe it
Text Mining, Search and Navigation 68
Microsoft ResearchText Mining, Search and Navigation
A Simple Example
1 2 1 2, : 1, 0D D l l
Imagine some cost C:1 1 1 2 2
1
2 1 1 2 22
( , , , )
( , , , )
Cs l s l
s
Cs l s l
s
70
Microsoft ResearchText Mining, Search and Navigation
1 2
1 2
1 2
1 2
211 1 2 2 2 1 2
2 21 11 1 2 2 2 1 1 22 2
1 1 2 2
:
0 : 1
0 :
: 0
0 : ( , , , )
0 : ( , , , ) ( ) ( )
: ( , , , ) 0
x s s
x
x x
x
C
x C s l s l s s
x C s l s l s s s s
x C s l s l
Letting
Then a cost function exists:
…furthermore it’s convex
71
Microsoft ResearchText Mining, Search and Navigation
LambdaRank
• Choose the to model the desired cost. (Need not use pairs!)
• Very general. Handles multivariate, non-smooth costs.
• But, how to choose the ? • When will there exist a cost function C for your
choice of ?• When will that C be convex?
' s
' s
' s
72
Microsoft ResearchText Mining, Search and Navigation
Some Multilinear Algebra Basics
• An ‘n-form’ on a manifold M is a totally antisymmetric tensor that lives in the dual of the tangent space of M
• You can apply the differential operator d to an n-form to get an (n+1)-form
• A closed form f is one for which df=0• An exact form g is one for which g=dh, for some
form h• dd=0 (every exact form is closed)
73
Microsoft ResearchText Mining, Search and Navigation
Poincare’s Lemma
If is an open set that is star-shaped with respect to
the origin, then any closed form defined on is exact.
nS
S
R
Hence on such a set, a form is exact iff it is closed.
0.
, :
Define the 1-form
Then for some iff
Using classical notation: Jacobian symmetric!
i ii
ji
j i
dx
dC C d
i jx x
74
Microsoft ResearchText Mining, Search and Navigation
The Jacobian
• Square matrix, of side nDocs• Family of Jacobians, one for each label set• Symmetric cost function exists• Positive semidefinite cost function is
convex• (…like a kernel, but more general:
depends on all points!)
75
Microsoft ResearchText Mining, Search and Navigation
A Physical Analogy
• Think of ranked documents as point masses, as forces
• If , the forces are conservative – they derive from a potential
• E.g. choosing the to be linear in the scores is equivalent to a spring model
' sdC
' s
76
Microsoft ResearchText Mining, Search and Navigation
LambdaRank Speedup for RankNet
• Most neural net training is stochastic (update weights after every pattern)
• Here we can compute and increment the gradients for each document (mini batch)
• Batch them, apply fprop and backprop once per doc, per query; factorize the gradient.
77