+ All Categories
Home > Documents > Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g...

Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g...

Date post: 29-Sep-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
70
Archit Gupta IIT Delhi Rahul Agrawal Microsoft AdCenter Manik Varma Microsoft Research India Yashoteja Prabhu Microsoft Research India Multi-Label Learning with Millions of Labels for Query Recommendation
Transcript
Page 1: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Archit Gupta IIT Delhi

Rahul Agrawal Microsoft AdCenter

Manik Varma Microsoft Research India

Yashoteja Prabhu Microsoft Research India

Multi-Label Learning with Millions of Labels for

Query Recommendation

Page 2: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Recommending Advertiser Bid Phrases

geico auto insurance

geico car insurance

geico insurance

www geico com

care geicos

geico com

need cheap auto insurance

wisconsin cheap car insurance quotes

cheap auto insurance florida

all state car insurance coupon code

Page 3: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Rewriting

geico auto insurance

geico car insurance

geico insurance

www geico com

care geicos

geico com

need cheap auto insurance

wisconsin cheap car insurance quotes

cheap auto insurance florida

all state car insurance coupon code

“Absolutely cheapest car insurance”

Page 4: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Ranking & Relevance Meta Stream

geico auto insurance

geico car insurance

geico insurance

www geico com

care geicos

geico com

need cheap auto insurance

wisconsin cheap car insurance quotes

cheap auto insurance florida

geico twitter

Page 5: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Recommending Advertiser Bid Phrases

geico auto insurance

geico car insurance

geico insurance

www geico com

care geicos

geico com

need cheap auto insurance

wisconsin cheap car insurance quotes

cheap auto insurance florida

all state car insurance coupon code

Page 6: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Learning to Predict a Set of Queries

X: Ads Y: Queries

f : X → 2Y

car insurance

italian restaurant

iphone

geico online quote

need cheap auto insurance

Page 7: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

f ( )

Learning to Predict a Set of Queries

need cheap auto insurance

geico

car insurance

Page 8: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Infinite number of labels (queries) • Training data acquisition • Efficient training • Cost of prediction

f ( )

Multi-Label Learning Challenges

need cheap auto insurance

geico

car insurance

Page 9: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Infinite number of labels (queries) • Training data acquisition • Efficient training • Cost of prediction

h( , geico) →

Binary Classification & Ranking

h : (X, Y) → { , }

h( , iphone) →

Page 10: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Infinite number of labels (queries) • Training data acquisition • Efficient training • Cost of prediction

Binary Classification

car insurance

italian restaurant

iphone

geico online quote

need cheap auto insurance

h : (X, Y) → { , }

Page 11: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Infinite number of labels (queries) • Training data acquisition • Efficient training • Cost of prediction

Binary Classification – KEX

h : (X, Y) → { , }

car insurance

geico online quote

switching to geico

Page 12: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Recommendations by KEX

Page 13: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Recommendations by KEX

h( , car insurance) → ?

h( , iphone) → ?

Page 14: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Recommendations by KEX

plastic ponies

simone

plastics

clothing and accessories

sylvia

pony clothing

couture

playground

plastic recycling

children's clothing

Page 15: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Multi-Label Learning Formulation

X: Ads Y: Queries

car insurance

italian restaurant

iphone

geico online quote

need cheap auto insurance f : X → 2Y

Page 16: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Learning with Millions of Labels

X: Ads Y: 10 Million Queries

car insurance

italian restaurant

iphone

geico online quote

need cheap auto insurance f : X → 2Y

Page 17: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• We develop Multi-Label Random Forests with logarithmic prediction costs that make predictions in a few milliseconds. • We train on 200 M points, 100 M categories and 10 M features in 28 hours on a grid with 1000 compute nodes. • We develop a tree growing criterion which learns from positive data alone. • We generate training data automatically from click logs. • We develop a sparse SSL formulation to infer beliefs about the state of missing and noisy labels.

Multi-Label Random Forests

Page 18: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• No annotator can mark all the relevant labels for a data point.

Training Data – Missing Labels

• We have missing labels during

• Training • Validation • Testing.

• Even fundamental ML techniques such as validation can go awry. • One can’t design error metrics invariant to missing labels.

Page 19: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Training Data and Features

TF-IDF Bag of Words Features

iphone color

material

Page 20: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Training Labels

TF-IDF Bag of Words Features

iphone color

material

case for iphone best iphone case apple iphone 3g metallic slim fit

case best iphone nn4 cases

iphone cases best iphone cases apple iphone 4g cases black white premium bumper

case apple iphone nn4 att

best iphone nn4 case case iphone apple iphone 4g premium soft

silicone rubber black phone protector skin cover case

bunny rabbit silicone case skin iphone nn4 stand tail holder

iphone 3gs cases otterbox universal defender case iphone nn4 black silicone black

plastic apple iphone nn4 cases iphone case

iphone 4s case sena iphone cases belkin grip vue tint case iphone

nn4 clear iphone 4g cases

iphone case speck iphone case best case iphone 4s iphone 4gs cases

iphone nn4 case switcheasy neo case iphone 3g

black best case iphone nn4 iphone 4s defender series case

3g iphone cases waterproof iphone case best iphone 3g cases iphone case design

apple iphone cases waterproof iphone cases best iphone 4s case iphone cases 3g

best iphone 3g case amazonbasics protective tpu case

screen protector att verizon iphone nn4 iphone 4s clear

best iphone 4s cases iphone cases 4g

Page 21: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Training Labels

Page 22: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Missing and Noisy Labels

best italian restaurants philadelphia

italian restaurant chains

italian restaurants italian restaurant connecticut

italian restaurant italian restaurant district

columbia

italian restaurants arkansas thai restaurant

italian restaurants connecticut thai restaurants

italian restaurants idaho restaurants

italian restaurants phoenix mexican restaurants

Page 23: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Missing and Noisy Labels

Page 24: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Biased Training Data

Query

Fre

qu

en

cy

Zipf's Law

• Most labels will have very few positive training examples

Page 25: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Multi-Label Prediction Costs

1-vs-All Classification

• Linear prediction costs are infeasible

geico

car insurance

iphone cases

pizza

Page 26: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Label and Feature Space Compression

1K Dimensional Embedding Space

10M Dimensional Label Space 6M Dimensional Feature Space

car

auto

motor vehicle

iphone cases

cases iphone

Car Ads

iphone Case Ads

iphone case

Page 27: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Hierarchical Prediction

• Prediction in logarithmic time

Page 28: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Gating Tree Based Prediction

0

0.1

0.2

0.3

0.4

Is the word “insurance” present in the ad?

Is the word “geico” present in the ad?

Yes

Yes

No

No

• Prediction in logarithmic time

Page 29: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Ensemble of Randomized Gating Trees

00.10.20.30.4

00.10.20.30.4

00.10.20.30.4

Page 30: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• We seek classifiers and optimization algorithms that • Are massively parallelizable • Don’t need to load the feature vectors (1 Tb) into RAM • Don’t need to load the label matrix (100 Gb) into RAM

Efficient Training

Number of training points 200 Million

Number of labels 100 Million

Dimensionality of feature vector 10 Million

Number of cores 500 – 1000

RAM per core 2 Gb

Training time 28 hours

Page 31: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

0

0.05

0.1

0.15

0.2

Multi-Label Random Forests

0

0.05

0.1

0.15

0.2

• The splitting cost needs to be calculated in a 210M space

Is the word “insurance” present?

Page 32: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

0

0.2

0.4

0.6

l1 l2 l3

Learning from Positively Labeled Data

𝑥𝑓∗ > 𝑡∗

• Split condition : 𝑥𝑓∗ > 𝑡∗

𝑓∗, 𝑡∗ = argmin𝑓,𝑡 𝑛𝑙 𝑝𝑙 𝑙𝑘 (1 − 𝑝𝑙 𝑙𝑘 ) + 𝑛𝑟 𝑝𝑟 𝑙𝑘 (1 − 𝑝𝑟 𝑙𝑘 )𝑘𝑘

𝑝 𝑙𝑘 = 𝑝 𝑙𝑘 adi 𝑝(ad𝑖)𝑖

0

0.2

0.4

0.6

0.8

l1 l2 l3

Page 33: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

0

1

l1 l2 l3

Multi-Label Random Forests

0

1

l1 l2 l3

0

1

l1 l2 l3

0

0.5

l1 l2 l3

𝐱1, 𝐲1 = {𝑙2, 𝑙3}

𝐱2, 𝐲2 = {𝑙1, 𝑙3}

𝐱3, 𝐲3 = {𝑙1, 𝑙2, 𝑙3}

𝑝(𝐲)

(𝐱1, 𝐲1) (𝐱2, 𝐲2) (𝐱3, 𝐲3)

Page 34: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Recommendation Data Sets

• Data set statistics

Data Set # of Training Points (M)

# of Test Points (M)

# of Dimensions (M)

# of Labels (M)

Wikipedia 1.53 0.66 1.89 0.97

Ads1 8.00 0.50 1.58 1.22

Web 40.00 1.50 2.62 1.22

Ads2 90.00 5.00 5.80 9.70

Page 35: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• We use loss functions where the penalty incurred for predicting the real (but unknown) ground truth is never more than that of predicting any other labelling

𝐿 𝐲∗, 𝐲Observed ≤ 𝐿 𝐲, 𝐲Observed ∀𝐲 ∈ 𝑌 • Hamming Loss • Precision at k • We found Precision at 10 to be robust for our application.

Performance Evaluation – Precision@k

Page 36: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Recommendation Results

0.00

5.00

10.00

15.00

20.00

25.00

30.00

Wikipedia Ads1 Web Ads2

MLRF

KEX

Percentage of top 10 predictions that were clicked queries

Page 37: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Recommendation Results

0.00

10.00

20.00

30.00

40.00

50.00

60.00

Wikipedia Ads1 Web Ads2

MLRF

KEX

Percentage of top 10 predictions that were relevant

Page 38: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 39: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Geico Car Insurance

KEX MLRF

geico auto insurance

geico car insurance

geico insurance

www geico com

care geicos

geico com

need cheap auto insurance

wisconsin cheap car insurance quotes

cheap auto insurance florida

all state car insurance coupon code

Page 40: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 41: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Domino’s Pizza

KEX MLRF

dominos

dominos pizza

domino pizza

domino pasta bowls

domino pizza coupons

domino pizza deals

domino pizza locations

domino pizza menu

domino pizza online

Page 42: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 43: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Simone & Sylvia Kid’s Clothing

KEX MLRF

plastic ponies toddlers clothes

simone toddlers clothing

plastics toddler costumes

clothing and accessories children clothes sale

sylvia children clothes

pony clothing designer children clothes

couture cute children clothes

playground retro clothing

Plastic recycling retro baby clothes

children's clothing baby clothing

Page 44: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 45: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

KCS Flowers

KEX MLRF

funeral flowers flowers delivery

sympathy funeral flowers funeral arrangements

web home birthday flowers

bleitz funeral home funeral flowers

funeral flowers discount funeral planning

yarington's funeral home flowers valentines

harvey funeral home free delivery flowers

green lake funeral home cheap flowers

howden kennedy funeral home

florists

arranging flowers cheap flowers funeral

Page 46: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 47: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Vistaprint Designer T-Shirts

KEX MLRF

embroidered apparel custom t shirts

custom apparel funny t shirts

readymade apparel hanes beefy t shirts

customizable hanes t shirts

apparel long sleeve t shirts

customizable apparel personalized t shirts

leading print printed t shirts

online business cards retro gamer t shirts

apparel and accessories t shirts

own text buy custom t shirts

Page 48: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 49: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Metlife Auto Insurance

KEX MLRF

metlife auto home insurance metlife auto insurance

auto home insurance auto Insurance

auto insurance car Insurance

massachusetts automobile Insurance

metlife agent geico insurance

driver discount cheap car insurance

additional cost metlife auto

saving benefits insurance broker

car discount insurance

auto quote home insurance

Page 50: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 51: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Wanta Thai Restaurant

KEX MLRF

authentic thai restaurant thai restaurant

delicious thai food thai restaurants

thai cuisine mexican restaurants

thai restaurant cheap hotels

thai food hotels

wanta fast food restaurants

best thai restaurant restaurants coupons

thai eateries best web hosting

restaurants

thai vegetarian foods

contemporary thai new york restaurants

Page 52: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 53: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

best italian restaurants philadelphia

italian restaurant chains

italian restaurants italian restaurant connecticut

italian restaurant italian restaurant district

columbia

italian restaurants arkansas thai restaurant

italian restaurants connecticut thai restaurants

italian restaurants idaho restaurants

italian restaurants phoenix mexican restaurants

Page 54: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple
Page 55: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Compensating for Missing Labels

Progressive insurance Allstate auto insurance

American family insurance

Esurance Auto insurance quotes

Case-mate phone cases

Maggiano’s restaurant

0.9

0.7 0.5

0.8

Page 56: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

0

1

l1 l2 l3

Training on Belief Vectors

0

1

l1 l2 l3

0

1

l1 l2 l3

0

0.5

l1 l2 l3

𝐱1, 𝐲1 = 𝑙2, 𝑙3 , 𝐟1

𝐱2, 𝐲2 = 𝑙1, 𝑙3 , 𝐟2

𝐱3, 𝐲3 = 𝑙1, 𝑙2, 𝑙3 , 𝐟3

𝑝(𝐟)

(𝐱1, 𝐟1) (𝐱2, 𝐟2) (𝐱3, 𝐟3)

Page 57: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Graph-based SSL optimizes label belief smoothness and fidelity to original labels

F* = MinF 1

𝟐Tr 𝐅𝑡 𝐈 − 𝐃−

𝟏

𝟐 𝐖 𝐃−𝟏

𝟐 𝐅 + 𝜆

𝟐 ||𝐅 − 𝐘||𝟐

s. t. 𝐅 0 ≤ 𝐾

𝐖𝑀𝑋𝑀 Document-document similarity matrix 𝐃𝑀𝑋𝑀 Diagonal matrix representing the row sums of W 𝐘𝑀𝑋𝐿 0/1 label matrix 𝐅𝑀𝑋𝐿 Real valued label belief matrix 𝜆 Trade-off Hyperparameter M Number of documents L Number of labels K Sparsity constant

Sparse Semi-Supervised Learning

Page 58: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Graph-based SSL optimizes label belief smoothness and fidelity to original labels

F* = Min𝐅

𝟏

𝟐𝚺𝑖=1..𝐿𝚺𝑗=1..𝑀

𝑙=1..𝑀

𝑤𝑗𝑙 ∗ (𝑭𝒊𝒋

𝑫𝒋𝒋−

𝑭𝒊𝒍

√𝑫𝒍𝒍)𝟐 +

𝜆

2 𝛴𝑖=1..𝑀𝑗=1..𝐿

(𝐹𝑖𝑗 − 𝑌𝑖𝑗)2

s. t. 𝐅 0 ≤ 𝐾

𝐖𝑀𝑋𝑀 Document-document similarity matrix 𝐃𝑀𝑋𝑀 Diagonal matrix representing the row sums of W 𝐘𝑀𝑋𝐿 0/1 label matrix 𝐅𝑀𝑋𝐿 Real valued label belief matrix 𝜆 Trade-off Hyperparameter M Number of documents L Number of labels K Sparsity constant

Sparse Semi-Supervised Learning

Page 59: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Sparse SSL formulation

F* = MinF 𝐽 𝐅 = 1

𝟐Tr 𝐅𝑡 𝐈 − 𝐃−

𝟏

𝟐 𝐖 𝐃−𝟏

𝟐 𝐅 + 𝜆

𝟐 ||𝐅 − 𝐘||𝟐

s. t. 𝐅 0 ≤ 𝐾

• The iterative hard thresholding algorithm converges to a global/local optimum

𝐅0 = 𝐘

𝐅𝑡+

𝟏

𝟐

=𝟏

𝝀+𝟏𝐃−

𝟏

𝟐 𝐖 𝐃−𝟏

𝟐𝐅𝑡 +𝝀

𝝀+𝟏𝐘

𝐅𝑡+𝟏 = Top𝐾(𝐅𝑡+𝟏𝟐

)

Iterative Hard Thresholding

Page 60: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• If 𝑌𝑖𝑗 ∈ {0, 1} and W is positive definite then

• The sequence 𝐅0, 𝐅𝟏, … converges to a stationary point 𝐅∗.

• 𝐽(𝐅0) ≥ 𝐽(𝐅1) ≥ ⋯ ≥ 𝐽(F∗)

• If |F∗|0 < 𝐾 then F∗ is a globally optimal solution

• If |F∗|0 = 𝐾 then F∗ is a locally optimal solution

𝐽 F∗ − 𝐽 𝐅+

≤ 𝑴𝒊𝒏(𝝀

𝟐𝑲 + 𝐘 𝟎 ,

𝝀 + 𝟏

𝟐𝑴𝑳 −𝑲 𝜶𝑲(𝐅

∗)√ 𝐘 𝟎)

Iterative Hard Thresholding

Page 61: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Semi-Supervised Learning Results

Data Set Click Labels (%) Human Verification (%)

MLRF MLRF+

SSL KEX MLRF

MLRF+ SSL

KEX

Wikipedia 15.72 18.53 11.63 24.46 27.17 17.51

Ads1 18.13 19.88 11.96 45.86 47.53 41.95

Bing 22.51 25.32 18.42 50.47 51.83 47.69

Ads2 15.91 17.12 12.45 41.28 43.78 36.69

• Precision@10 as judged by automatically generated click labels as well as by human experts.

Page 62: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Expansion Results

Data Set Click Labels (%) Human Verification (%)

MLRF+ SSL+KSP

KEX+KSP MLRF+

SSL+KSP KEX+KSP

Wikipedia 18.01 10.81 31.48 22.14

Ads1 21.54 12.38 51.08 43.27

Web 26.66 19.88 53.69 48.13

Ads2 19.24 14.35 46.77 40.07

• Query expansion techniques can help both KEX and MLRF

Page 63: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Query Recommendation Results

• Edit distance [Ravi et al. WSDM 2010]

Data Set Click Labels (%)

KEX KEX+KSP MLRF MLRF+SSL MLRF+SSL+

KSP

Wikipedia 0.81 0.78 0.71 0.66 0.63

Ads1 0.83 0.76 0.71 0.65 0.61

Web 0.73 0.68 0.65 0.62 0.58

Ads2 0.77 0.73 0.69 0.63 0.59

Page 64: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Query recommendation can be posed as multi-label learning. • Learning with millions of labels can be tractable and accurate. • Other applications

• Query expansion. • Document and ad relevance and ranking. • Fine-grained query intent classification.

Conclusions

Page 65: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Deepak Bapna • Prateek Jain • A. Kumaran • Mehul Parsana • Krishna Leela Poola • Adarsh Prasad • Varun Singla

Acknowledgements

Page 66: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

• Can generalize to other domains such as images on Flickr or videos on YouTube.

Advantages of an ML Approach

Page 67: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

System Architecture

Evaluator 1

Combiner 1

Maximizer 1 Maximizer 2

Combiner 2

Combiner 3

Evaluator 4

Evaluator 3

Evaluator 2

• We leverage the Map/Reduce framework.

• Trees are grown in parallel breadth-wise.

• Number of compute nodes • Evaluators 500 • Combiners 100 • Maximizers 25

• Our objective is to balance

the compute load across machines while minimizing data flow

XN+1,YN+1

to X2N, Y2N

X2N+1,Y2N+1

to X3N, Y3N

X3N+1,Y3N+1

to X4N, Y4N

X1,Y1

to XN, YN

F*, T*

Page 68: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Evaluators

Evaluator 1

Combiner 1

Maximizer 1 Maximizer 2

Combiner 2

Combiner 3

Evaluator 4

Evaluator 3

Evaluator 2

• Input • N training instances • Set of keys – Tree ID, Node

ID, Feature ID and threshold

• Output

• Partial label distributions for the keys

• Computation

• N * # of keys

XN+1,YN+1

to X2N, Y2N

X2N+1,Y2N+1

to X3N, Y3N

X3N+1,Y3N+1

to X4N, Y4N

X1,Y1

to XN, YN

F*, T*

Page 69: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Combiners

Evaluator 1

Combiner 1

Maximizer 1 Maximizer 2

Combiner 2

Combiner 3

Evaluator 4

Evaluator 3

Evaluator 2

• Input • Partial label distributions

for assigned keys

• Output • Objective function values

for the keys.

• Computation • # of keys * Avg # of

Evaluators / key * # of labels in the distribution for the key. XN+1,YN+1

to X2N, Y2N

X2N+1,Y2N+1

to X3N, Y3N

X3N+1,Y3N+1

to X4N, Y4N

X1,Y1

to XN, YN

F*, T*

Page 70: Multi-Label Learning with Millions of Labels for Query …manikvarma.org/talks/ · apple iphone 3g metallic slim fit case best iphone nn4 cases iphone cases best iphone cases apple

Maximizers

Evaluator 1

Combiner 1

Maximizer 1 Maximizer 2

Combiner 2

Combiner 3

Evaluator 4

Evaluator 3

Evaluator 2

• Input • Objective function values

for assigned keys

• Output – • Optimal feature and

threshold for assigned nodes in trees.

• Computation • # of keys * Avg # of

features per key * Avg # of thresholds per feature XN+1,YN+1

to X2N, Y2N

X2N+1,Y2N+1

to X3N, Y3N

X3N+1,Y3N+1

to X4N, Y4N

X1,Y1

to XN, YN

F*, T*


Recommended