+ All Categories
Home > Documents > Microsoft Research Online Search and Advertising, Future and Present Chris Burges Microsoft Research...

Microsoft Research Online Search and Advertising, Future and Present Chris Burges Microsoft Research...

Date post: 27-Dec-2015
Category:
Upload: vincent-gallagher
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
76
Microsoft Research Online Search and Advertising, Future and Present Chris Burges Microsoft Research Saturday, Dec 13, 2008 Text Mining, Search and Navigation 1
Transcript

Microsoft Research

Online Search and Advertising, Future and Present

Chris BurgesMicrosoft Research

Saturday, Dec 13, 2008

Text Mining, Search and Navigation 1

Microsoft Research

Contents

• Search and Advertising – some ideas• Where are we headed?• How to begin?

• Some new results on ranking: we can directly learn Information Retrieval measures

• Internet security and RSA: why worry?

Text Mining, Search and Navigation 2

Microsoft Research

~ Search and Advertising ~

Text Mining, Search and Navigation 3

Microsoft Research

Why Search Works…

• Traditional: print, TV, radio, billboards,…• Only very broadly targeted to

demographics (some exceptions)• Search is monetarily successful because

advertising is more precisely targeted• The Google model is giving and will

continue to give traditional channels a run for their money

Text Mining, Search and Navigation 4

Microsoft Research

Key Points

• The online experience will be more deeply engaging.

Text Mining, Search and Navigation 5

Microsoft Research

What’s wrong with what we do now?

• Nothing, but… ten blue links + ads, ten years from now?

• Ads are ‘tacked on’ to the user experience.• Paid Search / Contextual / Banner – all are

still largely impersonal.• But, Behavioral Targeting…

Text Mining, Search and Navigation 6

Microsoft Research

How might ads be targeted better?• I just bought a car – don’t show me more

ads for cars• I just bought a house – show me ads for

furniture• I like band X, but not Y• In general, build a model of what I’m in the

market for• Per-user pricing, availability• User-driven asks (show me all ads for Z)

Text Mining, Search and Navigation 7

Microsoft Research

User Models

• User models can be used to enrich the

online experience, not just advertising.- Automated teaching– Need a model of the user’s understanding.

• Find other users with similar interests• Tailor news presentation to user’s interests

Text Mining, Search and Navigation 8

Microsoft Research

Key Points• The online experience will be more deeply

engaging.• We will need rich state models of users: likes,

dislikes, ± interests, knowledge

Text Mining, Search and Navigation 9

Microsoft Research

What About Search?

Text Mining, Search and Navigation 10

Microsoft Research

Search: Somewhere in the Near Future

Text Mining, Search and Navigation 11

Query

Structured Data:

Distribution over Intents

Indexed Web Data

Structured Data: Diversity;

Popular Pages; Aid Transaction

Human Computer

Dialog

84% Info.12% Nav.4% Trans.

78% Comm.…

Display

Microsoft ResearchText Mining, Search and Navigation 12

Microsoft Research

How to get the information we need, to build good models for users?

Ask them!

Text Mining, Search and Navigation 13

Microsoft Research

Key Points• The online experience will be more deeply

engaging.• We will need rich state models of users: likes,

dislikes, ± interests, knowledge, and more.• Natural Language Processing will be key.

Text Mining, Search and Navigation 14

Microsoft Research

Search Applications: And,Data Changes Everything

Text Mining, Search and Navigation 15

• Example: AskMSR (Brill, Dumais, Banko, ACL 2002)• Commonly used resources for QA:

• Part-of-speech tagger, parser, named-entity extractor, WordNet or other knowledge bases, passage or sentence retrieval, abduction, etc.

• AskMSR doesn’t use any of them • Instead, AskMSR focuses on data:

• There is a lot of data on the web – use it• Redundancy is a resource to be exploited

• Data-driven QA: simple techniques, lots of data

Microsoft ResearchText Mining, Search and Navigation 16

Microsoft ResearchText Mining, Search and Navigation 17

Data Changes Everything

Banko and Brill, Mitigating…, ICHLTR 2001

Microsoft ResearchText Mining, Search and Navigation 18

Data Changes Everything

Banko and Brill, Scaling…, 2001

Microsoft Research

Key Points• The online experience will be more deeply

engaging.• We will need rich state models of users: likes,

dislikes, ± interests, knowledge, and more.• Natural Language Processing will be key.• “Search” can be the engine under the hood for

many different applications.• It’s better to use tons of data and simple

models, versus smaller datasets and complex models.

Text Mining, Search and Navigation 19

Microsoft Research

Key Points• The online experience will be more deeply

engaging.• We will need rich state models of users: likes,

dislikes, ± interests, knowledge, and more.• Natural Language Processing will be key.• “Search” can be the engine under the hood for

many different applications.• It’s better to use tons of data and simple

models, versus smaller datasets and complex models.

Text Mining, Search and Navigation 20

Microsoft Research

How to proceed?

• Don’t know. But: Sam, a Search Chatbot.– Provide an engaging chat experience– Use Search to show images, urls, videos,…– Will build persistent user world models– Will have its own world model– Can show precisely targeted ads– Will leverage social networks

Text Mining, Search and Navigation 21

Microsoft Research

The Eliza Effect

• Eliza: J. Weizenbaum,1966 (!)• Demonstrated that extremely simple

techniques can result in compelling dialog (sometimes, for some users)

• Users tend to anthropomorphize computer behavior

• This is gives us an advantage

Text Mining, Search and Navigation 22

Microsoft Research

Our Prime Directive in Building Sam:

Text Mining, Search and Navigation 23

Do as little supervision as possible.

Microsoft Research

Let the Data do the Work• anarchism category: anarchism• anarchism category: political ideologies• anarchism category: political philosophies• anarchism category: social philosophy• autism category: autism• autism category: pervasive developmental disorders• autism category: childhood psychiatric disorders• autism category: communication disorders• autism category: neurological disorders• albedo category: electromagnetic radiation• albedo category: climatology• albedo category: climate forcing• albedo category: scattering, absorption and radiative transfer (optics)• albedo category: radiometry• abu dhabi category: abu dhabi• abu dhabi category: capitals in asia• abu dhabi category: cities in the united arab emirates• abu dhabi category: coastal cities• a category: latin letters• a category: vowel letters

Text Mining, Search and Navigation 24

Robert Rounthwaite, TMSN

Microsoft Research

Using Category Graphs to Drive Dialog

– User: I like ferrets.– Ferret: category: animals people keep as pets– Animals people keep as pets: rabbits– Sam: Do you like rabbits, too?

Text Mining, Search and Navigation 25

Use ODP and Wikipedia hierarchies to construct graph

Microsoft Research

Use Category Graphs to Build Models

• Attach vector to each node, sparsely:– [like/dislike; interested/not; knows about; …]

• Each component has confidence level• Leverage graph structure to explore

Text Mining, Search and Navigation 26

“World model” for both user and for Sam

Microsoft Research

Other Useful Sources of Data

Text Mining, Search and Navigation 27

qi+2

qi+1

qi+3

qi

uj

uj+1

uj+2

qj qj+1 qj+2

Q. Mei, D. Zhou, K. Church, Query Suggestion Using Hitting Time, CIKM 2008

Microsoft Research

Temporal Querying Behavior

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

-300 -200 -100 0 100 200 300

Lif

t in

Pro

babi

lity

Days Since Query for "interview"

resume moving

M. Richardson, ACM Transactions on the Web 2008

28Text Mining, Search and Navigation

Microsoft Research

We Are Not Alone

• http://quizbot.trueknowledge.com/• http://www.jabberwacky.com/

Text Mining, Search and Navigation 29

Jab: Do you like computers? Me: Yes, but only when they are turned on. Jab: I like ferraris, do you like them also? Me: Not really. Why do you like them? Jab: Yes. Me: Why do you like them? Jab: No, I like going to mars.

Microsoft Research

One Possible Sentence Generator

• Inputs:– Sentiment– Distribution over topics under discussion– Features from recent sentences– Sentence or phrase database (with statistics)– Distributions over users likes / interests, etc.– Close or popular nodes where bot lacks

knowledge of user– Topic priors

• Output: ranked sentences

Text Mining, Search and Navigation 30

Microsoft Research

New Challenges for Machine Learning• How can we teach a chatbot to talk?

– “Good / bad response” buttons: reinforcement learning?

– ESP-like games for labeling for learning to rank sentences?

– Build natural sentences from phrases?• How can we learn effective user models?

– Combine from multiple users to form good priors– Use active learning during chat to reduce

uncertainty in the user’s model

Text Mining, Search and Navigation 31

Microsoft Research

Demo

Text Mining, Search and Navigation 32

Joint work with Scott Imig, Silviu Cucerzan

S. Cucerzan, Large Scale Named Entity Disambiguation based on Wikipedia data, Proc. 2007 Joint Conference on EMNLP and CNLL

Microsoft Research

~ Some New Results on Ranking ~

Text Mining, Search and Navigation 33

Microsoft Research

Empirical Optimality of -rank

Joint work with:– Pinar Donmez (CMU)– Krysta Svore (MSR)– Yisong Yue (Cornell)

Text Mining, Search and Navigation 34

Microsoft Research

Some IR Measures

• Precision: Recall: • Average Precision: Compute precision for each

positive, average over all positions • Mean Average Precision: Average AP over queries

• Mean Reciprocal Rank (TREC QA)

• Mean NDCG: , averaged over queries

Text Mining, Search and Navigation 35

Microsoft Research

IR Measures, cont.

Text Mining, Search and Navigation 36

These measures:

• Depend only on the labels and the sorted order of the documents

• Viewed as a function of the scores output by some model, are everywhere either flat or discontinuous

- SVM MAP: Yue et. al, SIGIR ’07- Tao Qin, Tie-Yan Liu, Hang Li, MSR Tech Report 164 (2008)

Microsoft Research

LambdaRank: Background

Text Mining, Search and Navigation 37

Microsoft Research

The RankNet Cost

Text Mining, Search and Navigation 38

Modeled posteriors:

Target posteriors:

Define

Cross entropy cost:

Model output probabilities using logistic:

Microsoft ResearchText Mining, Search and Navigation 39

o1-o2

C(o

1-o

2)

-5 -4 -3 -2 -1 0 1 2 3 4 50

1

2

3

4

5

6

P=0.0

P=0.5

P=1.0

Microsoft Research

RankNet Cost ~ Pairwise Cost

Text Mining, Search and Navigation 41

o1-o2

C(o

1-o

2)

-5 -4 -3 -2 -1 0 1 2 3 4 50

1

2

3

4

5

6

Microsoft ResearchText Mining, Search and Navigation 42

Pairwise Cost Revisited

Pairwise cost fine if no errors, but:

13 errors 11 errors

Microsoft ResearchText Mining, Search and Navigation

LambdaRankInstead of using a smooth approximation to the cost, andtaking derivatives, write down the derivatives directly.

Then use these derivatives to train a model usinggradient descent, as usual.

1s

2s

43

Microsoft Research

The Lambda Function

Text Mining, Search and Navigation 44

NDCG gain in swapping members of a pair of docs,multiplied by RankNet cost gradient as a smoother:

Let be the set of documents labeled higher (lower) than document :

Microsoft Research

Lambda Functions for MAP, MRR

Text Mining, Search and Navigation 45

Microsoft Research

Local Optimality

• Check the gradient vanishes at solution.• Get bound on probability that we’re not at a

local max, using one-sided Monte Carlo

Text Mining, Search and Navigation 46

P (We miss ascent direction despite k trials)

How large must k be for if ?

Answer:

Microsoft Research

Data Sets

• Artificial: 300 features, 50 urls/query, 10k/5k/10k train/valid/test split

• Web 1: 420 features, 26 urls/query, 10k/5k/10k split

• Web 2: 30k/5k/10k split

Text Mining, Search and Navigation 47

Microsoft Research

Which function to choose?

Text Mining, Search and Navigation 48

Lambda Gradient MAP ± SE MRR ± SE

RankNetWeightPairs 0.462 ± 0.0048 0.524 ± 0.0059

LocalGradient 0.435 ± 0.0048 0.515 ± 0.0060

LocalCost 0.427 ± 0.0049 0.512 ± 0.0059

SpringSmooth 0.424 ± 0.0048 0.498 ±0.0058

DiscreteGradient 0.401 ± 0.0049 0.471 ± 0.0059

• LocalGradient: finite element estimate of gradient, with margin• LocalCost: estimate local gradient using neighbors + weighted RankNet cost• SpringSmooth: smoother version of RankNetWeightPairs• DiscreteGradient: finite element estimate using optimal position

Microsoft ResearchText Mining, Search and Navigation

49

10K Web 30K Web Artificial

Microsoft Research

Sample Size Matters

Text Mining, Search and Navigation 50

10K Train 30K Train

Test Train Test Score Train Test Score

NDCG 0.416 NDCG 0.428

NDCG MAP 0.412 MAP 0.422

MRR 0.396 MRR 0.406

NDCG 0.442 NDCG 0.453

MAP MAP 0.439 MAP 0.456

MRR 0.429 MRR 0.449

NDCG 0.519 NDCG 0.532

MRR MAP 0.516 MAP 0.533

MRR 0.508 MRR 0.537

• Number of pairs drops by >2 for MRR and MAP• For MRR, number of samples drops much further

Microsoft Research

IR Measure Optimality - Conclusions

• Typically, IR practitioners would train models with small numbers of ‘smart’ features (~ BM25), and perform grid search

• However, adding many weak features improves performance

• We have shown that the LambdaRank gradients optimize three IR measures directly

Text Mining, Search and Navigation 51

Microsoft Research

~ RSA, Factoring, and Optimization ~

Text Mining, Search and Navigation 52

Microsoft Research

Factoring biprimes as optimization

Text Mining, Search and Navigation 53

• The security of internet commerce (SSL, RSA) rests on a mathematical conjecture, namely, that factoring biprimes is combinatorically hard.

• Conjectures aren’t necessarily true. If this conjecture is false, there is no simple backup plan.

• The current fastest known factoring method is the general number field sieve. It is exponentially slow:

Microsoft ResearchText Mining, Search and Navigation 54

Circumstantial Evidence That Factoring is Not NP-hard

• There are very few known problems that quantum computers could solve exponentially faster than classical computers.

• Factoring is one of them: (Shor, 94). The “discrete logarithm” is one more.

• Much work since then to find a quantum algorithm that solves an NP complete problem has failed. A quantum computer must use domain knowledge (S. Aaronson, 2008)

• Searching a list of solutions is not exponential: (Grover, 96)

Microsoft Research

Is This The Best We Can Do?

Even exponential complexity, but with better N dependence, would be interesting.

Text Mining, Search and Navigation 55

Microsoft Research

RSA ChallengeRSA Number Decimal digits Binary digits Cash prize offered Factored on Factored by

RSA-100 100 330   April 1991 Arjen K. Lenstra

RSA-110 110 364   April 1992 Arjen K. Lenstra and M.S. Manasse

RSA-120 120 397   June 1993 T. Denny et al.

RSA-129 129 426 $100 USD April 1994 Arjen K. Lenstra et al.

RSA-130 130 430   April 10, 1996 Arjen K. Lenstra et al.

RSA-140 140 463   February 2, 1999 Herman J. J. te Riele et al.

RSA-150[3] 150 496   April 16, 2004 Kazumaro Aoki et al.

RSA-155 155 512   August 22, 1999 Herman J. J. te Riele et al.

RSA-160 160 530   April 1, 2003 Jens Franke et al., University of Bonn

RSA-170 170 563   open

RSA-576 174 576 $10,000 USD December 3, 2003 Jens Franke et al., University of Bonn

RSA-180 180 596   open

RSA-190 190 629   open

RSA-640 193 640 $20,000 USD November 2, 2005 Jens Franke et al., University of Bonn

RSA-200 200 663   May 9, 2005 Jens Franke et al., University of Bonn

RSA-210 210 696   open

RSA-704 212 704 $30,000 USD open

RSA-768 232 768 $50,000 USD open

RSA-896 270 896 $75,000 USD open

RSA-1024 309 1024 $100,000 USD open

RSA-1536 463 1536 $150,000 USD open

RSA-2048 617 2048 $200,000 USD open

(Wikipedia)56Text Mining, Search and Navigation

Microsoft Research

Represent the Problem in Binary1 0 1 1 0 1 1 1 𝑥2 𝑥1 1 1 𝑦1 1 1 𝑥2 𝑥1 1 𝑦1 𝑥2𝑦1 𝑥1𝑦1 𝑦1 1 𝑥2 𝑥1 1

𝑥1 + 𝑦1 = 1 𝑥2 + 𝑥1𝑦1 + 1 = 0+ 2𝑧1 1+ 𝑥2𝑦1 + 𝑥1 + 𝑧1 = 1+ 2𝑧2 𝑦1 + 𝑥2 + 𝑧2 = 1+ 2𝑧3 1+ 𝑧3 = 0+ 2𝑧4 𝑧4 = 1 Text Mining, Search and Navigation 57

Microsoft Research

First Trick: LinearizationReplace 𝑥𝑖𝑦𝑗 by 𝜂𝑖𝑗 everywhere, and add constraints 𝑥𝑖 + 𝑦𝑗 ≥ 2𝜂𝑖𝑗 𝑥𝑖 + 𝑦𝑗 − 1 ≤ 𝜂𝑖𝑗 0 ≤ 𝜂𝑖𝑗 ≤ 1

Key trick: for ሼ𝑥,𝑦,𝜂ሽ∈{0,1} 𝑥+ 𝑦− 1 ≤ 𝜂: 𝑥𝑦= 1 →𝜂 = 1 𝑥+ 𝑦≥ 2𝜂: 𝑥𝑦= 0 →𝜂 = 0 𝑥+ 𝑦≥ 2𝜂: 𝜂 = 1 →𝑥𝑦= 1 𝑥+ 𝑦− 1 ≤ 𝜂: 𝜂 = 0 →𝑥𝑦= 0 so in {0,1}, 𝜂 = 𝑥𝑦.

58Text Mining, Search and Navigation

Microsoft Research

Linearization, cont.

Theorem: Integer solutions of the {𝑥,𝑦,𝑧,𝜂} equations are in 1-1 correspondence with integer solutions of the {𝑥,𝑦,𝑧} equations. Given two corresponding solutions, the 𝑥,𝑦 and 𝑧 variables take the same values.

(Not immediately obvious: e.g. 𝑥1𝑦1 = 1, 𝑥2𝑦2 = 1 → 𝑥1𝑦2 = 1)

59Text Mining, Search and Navigation

Microsoft Research

A Geometrical Problem

Text Mining, Search and Navigation 61

Microsoft Research

Second Trick: Quantization

Reduce feasible region via Linear Programming.

Maximize 𝑐.𝑥, 𝑥∈𝐹, with 𝑐= ሺ1,0ሻ: max𝑥 𝑐∙𝑥= 0.95 →𝑥1 = 0

Text Mining, Search and Navigation 62

Microsoft Research

Maximize 𝑐.𝑥, 𝑥∈𝐹, with 𝑐= ሺ1,1ሻ: 𝑥1𝑚 = 1.0, 𝑥2𝑚 = 0.9, 𝑐∙𝑥= 1.9

Impose 𝑐∙𝑥≤ 1:

ሼ𝑥: 𝑐∙𝑥≤ 1ሽ∩𝐹

Text Mining, Search and Navigation 63

Microsoft Research

Quantization Without LPs

𝑥1 + 𝑥2 ≤ 1 𝑥2 + 𝑥3 ≤ 1 𝑥3 + 𝑥1 ≤ 1

→ 𝑥1 + 𝑥2 + 𝑥3 ≤ 1.5 → 𝑥1 + 𝑥2 + 𝑥3 ≤ 1

64Text Mining, Search and Navigation

Microsoft Research

More Simple Tricks

• Checking quantized versions of LP solutions is very fast (do the long division)

• Concentrate on the subspace.• Work with the smallest dimensional subspaces

that give new constraints.• Randomized algorithms?

5081 * 6007 = 30521567:1001111011001 * 1011101110111 = 1110100011011100011011111

65Text Mining, Search and Navigation

Microsoft Research

The Geometric View

Distance from origin to simplex: 1/ξ𝑛 Volume of ‘corner’: 1 𝑛!Τ Longest span inside unit cube: ξ𝑛

Lemma: Denote the binary variables corresponding to vertex 𝑣 ∈𝒰 by 𝑏𝑖 ∈ሼ0,1ሽ, 𝑖 =1,…,𝑛. Then the (un-normalized) normal to the hyperplane defined by the regular simplex which intersects all vertices which differ from 𝑣 by one (in one coordinate) is 𝑛 = 𝕝− 2𝑣 (where 𝕝 is the vector of all ones), where the sign has been chosen such that 𝑛 at 𝑣 points into 𝒰. The equation of the corresponding hyperplane is 𝑥∙𝑛 = 1− 𝑝 where 𝑝 is the number of ones in 𝑣, and the corresponding constraint (delimiting the region lying inside 𝒰 but not including 𝑣) is 𝑥∙𝑛 ≥ 1− 𝑝.

66Text Mining, Search and Navigation

Microsoft Research

Projections Lose Information

Text Mining, Search and Navigation 67

How fast can randomized projections in subspaces find the solution?

Microsoft Research

Conclusions

• Search (and advertising) are likely to become more ubiquitous and better targeted

• Ranking algorithms are a key tool, and we can directly optimize finicky IR measures

• RSA is probably safe as houses, but we should probe it

Text Mining, Search and Navigation 68

Microsoft Research

Backup Slides

Text Mining, Search and Navigation 69

Microsoft ResearchText Mining, Search and Navigation

A Simple Example

1 2 1 2, : 1, 0D D l l

Imagine some cost C:1 1 1 2 2

1

2 1 1 2 22

( , , , )

( , , , )

Cs l s l

s

Cs l s l

s

70

Microsoft ResearchText Mining, Search and Navigation

1 2

1 2

1 2

1 2

211 1 2 2 2 1 2

2 21 11 1 2 2 2 1 1 22 2

1 1 2 2

:

0 : 1

0 :

: 0

0 : ( , , , )

0 : ( , , , ) ( ) ( )

: ( , , , ) 0

x s s

x

x x

x

C

x C s l s l s s

x C s l s l s s s s

x C s l s l

Letting

Then a cost function exists:

…furthermore it’s convex

71

Microsoft ResearchText Mining, Search and Navigation

LambdaRank

• Choose the to model the desired cost. (Need not use pairs!)

• Very general. Handles multivariate, non-smooth costs.

• But, how to choose the ? • When will there exist a cost function C for your

choice of ?• When will that C be convex?

' s

' s

' s

72

Microsoft ResearchText Mining, Search and Navigation

Some Multilinear Algebra Basics

• An ‘n-form’ on a manifold M is a totally antisymmetric tensor that lives in the dual of the tangent space of M

• You can apply the differential operator d to an n-form to get an (n+1)-form

• A closed form f is one for which df=0• An exact form g is one for which g=dh, for some

form h• dd=0 (every exact form is closed)

73

Microsoft ResearchText Mining, Search and Navigation

Poincare’s Lemma

If is an open set that is star-shaped with respect to

the origin, then any closed form defined on is exact.

nS

S

R

Hence on such a set, a form is exact iff it is closed.

0.

, :

Define the 1-form

Then for some iff

Using classical notation: Jacobian symmetric!

i ii

ji

j i

dx

dC C d

i jx x

74

Microsoft ResearchText Mining, Search and Navigation

The Jacobian

• Square matrix, of side nDocs• Family of Jacobians, one for each label set• Symmetric cost function exists• Positive semidefinite cost function is

convex• (…like a kernel, but more general:

depends on all points!)

75

Microsoft ResearchText Mining, Search and Navigation

A Physical Analogy

• Think of ranked documents as point masses, as forces

• If , the forces are conservative – they derive from a potential

• E.g. choosing the to be linear in the scores is equivalent to a spring model

' sdC

' s

76

Microsoft ResearchText Mining, Search and Navigation

LambdaRank Speedup for RankNet

• Most neural net training is stochastic (update weights after every pattern)

• Here we can compute and increment the gradients for each document (mini batch)

• Batch them, apply fprop and backprop once per doc, per query; factorize the gradient.

77

Microsoft Research

Speedup Results

Text Mining, Search and Navigation 78


Recommended