Large Scale Multi-Label Classification via MetaLabeler
Lei Tang Arizona State University
Suju Rajan and Vijay K. Narayanan Yahoo! Data Mining & Research
Yahoo! Data Mining & Research
Large Scale Multi-Label Classification
• Huge number of instances and categories• Common for online contents
Web Page Classification
Query Categorization
Video Annotation/Organization
Social Bookmark/Tag Recommendation
Yahoo! Data Mining & Research
Challenges
• Multi-Class: thousands of categories• Multi-Label: each instance has >1 labels• Large Scale: huge number of instances and categories
– Our query categorization problem: 1.5M queries, 7K categories– Yahoo! Directory 792K docs, 246K categories in Liu et al. 05
• Most existing multi-label methods do not scale– structural SVM, mixture model, collective inference, maximum-
entropy model, etc.
• The simplest One-vs-Rest SVM is still widely used
Yahoo! Data Mining & Research
One-vs-Rest SVMx1 C1, C3
x2 C1, C2, C4
x3 C2
x4 C2, C4
x1 +
x2 +
x3 -
x4 -
x1 -
x2 +
x3 +
x4 +
x1 +
x2 -
x3 -
x4 -
x1 -
x2 +
x3 -
x4 +
C1 C2 C3 C4
SVM1 SVM2 SVM3 SVM4
C1 C2C3 C4
Predict
Yahoo! Data Mining & Research
One-vs-Rest SVM
• Pros:– Simple, Fast, Scalable– Each label trained independently, easy to parallel
• Cons:– Highly skewed class distribution (few +, many -)– Biased prediction scores
• Output reasonable good ranking (Rifkin and Klauta 04)– e.g. 4 categories C1, C2, C3, C4
– True Labels for x1: C1, C3
– Prediction Scores: {s1, s3} > {s2, s4}
• Predict the number of labels?
Yahoo! Data Mining & Research
MetaLabeler Algorithm
1. Obtain a ranking of class membership for each instance
– Any genetic ranking algorithm can be applied– Use One-vs-Rest SVM
2. Build a Meta Model to predict the number of top classes
– Construct Meta Label– Construct Meta Feature– Build Meta Model
Yahoo! Data Mining & Research
Meta Model – Training
Q2 = cotton children jeansLabels:
• Children clothing
Q3 = leather fashion in 1990sLabels:
• Fashion• Women Clothing• Leather Clothing
Q1 = affordable cocktail dressLabels:
• Formal wear• Women Clothing
Q1: 2Q2: 1Q3: 3
Meta dataQuery: #labels
Meta-ModelOne-vs-Rest
SVM
Clothing
WomenClothing
Formalwear Fashion
ChildrenClothing
Regression
Leather clothing
How to handle predictions like
2.5 labels?
Yahoo! Data Mining & Research
Meta Feature Construction
• Content-Based– Use raw data– Raw data contains all the info
• Score-Based– Use prediction scores– Bias with scores might be learned
• Rank-Based– Use sorted prediction scores
C1 C2 C3 C40.9 -0.2 0.7 -0.6
C1 C2 C3 C4Meta Feature 0.9 -0.2 0.7 - 0.6
Meta Feature0.9 0.7 -0.2 -0.6
Yahoo! Data Mining & Research
MetaLabeler Prediction
• Given one instance:– Obtain the rankings for all labels;– Use the meta model to predict the number of labels– Pick the top-ranking labels
• MetaLabeler– Easy to implement– Use existing SVM package/software directly– Can be combined with a hierarchical structure easily
• Simply build a Meta Model at each internal node
Yahoo! Data Mining & Research
Baseline Methods
• Existing thresholding methods (Yang 2001)– Rank-based Cut (Rcut)
• output fixed number of top-ranking labels for each instance
– Proportion-based Cut• For each label, choose a portion of test instances as positive • Not applicable for online prediction
– Score-based Cut (Scut, aka. threshold tuning)• For each label, determine a threshold based on cross-validation• Tends to overfit and is not very stable
• MetaLabeler: A local RCut method– Customize the number of labels for each instance
Yahoo! Data Mining & Research
Publicly Available Benchmark Data
• Yahoo! Web Page Classification– 11 data sets:
• each constructed from a top-level category• 2nd level topics are the categories
– 16-32k instances, 6-15k features, 14-23 categories– 1.2 -1.6 labels per instance, maximum 17 labels– Each label has at least 100 instances
• RCV1:– A large scale text corpus – 101 categories, 3.2 labels per instance– For evaluation purpose, use 3000 for training, 3000 for testing– Highly skewed distribution (some labels have only 3-4 instances)
Yahoo! Data Mining & Research
MetaLabeler of Different Meta Features
• Which type of meta feature is more predictive?
• Content-based MetaLabeler outperforms other meta features
Exact Match Ratio Micro-F1 Macro-F130
35
40
45
50
55
60
65
Yahoo!
content score rank
Exact Match Ratio Micro-F1 Macro-F120
30
40
50
60
70
80
RCV1
content score rank
Yahoo! Data Mining & Research
Performance Comparison
• MetaLabeler tends to outperform other methods
Exact Match Ratio Micro-F1 Macro-F130
35
40
45
50
55
60
65
Yahoo!
SVM RCutSCut MetaLabeler
Exact Match Ratio Micro-F1 Macro-F120
30
40
50
60
70
80
RCV1
SVM RCutSCut MetaLabeler
Yahoo! Data Mining & Research
Bias with MetaLabeler
• The distribution of number of labels is imbalanced– Most instances have small number of labels;– Small portion of data instances have many more labels
• Imbalanced Distribution leads to bias in MetaLabeler– Prefer to predict lesser labels– Only predict many labels with strong confidence
1 2 3 4 5 6 70
2000
4000
6000
8000
10000
12000
Label Distribution on Yahoo! Society Data
Ground Truth MetaLabeler Prediction
Number of Labels
Freq
uenc
y
Yahoo! Data Mining & Research
Scalability Study
• Threshold tuning requires cross-validation, otherwise overfit• MetaLabeler simply adds some meta labels and learn One-vs-
Rest SVMs
1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
200400600800
100012001400160018002000
Computation Time Comparison on Yahoo! Society Data
SVMMetaLabelerThreshold Tuning
Number of Samples for Training
Com
puta
tion
Tim
e (s
econ
ds)
Yahoo! Data Mining & Research
Scalability Study (cond.)
• Threshold tuning: linearly increasing with number of categories in the data– E.g. 6000 categories -> 6000 thresholds to be tuned
• MetaLabeler: upper bounded by the maximum number of labels with one instance– E.g. 6000 categories – but one instance has at most 15 labels– Just need to learn additional 15 binary SVMs
• Meta Model is “independent” of number of categories
Yahoo! Data Mining & Research
Application to Large Scale Query Categorization
• Query categorization problem:– 1.5 million unique queries: 1M for training, 0.5M for testing– 120k features– A 8-level taxonomy of 6433 categories
• Multiple labels – e.g. 0% interest credit card no transfer fee
• Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Balance Transfer• Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Low Interest Card• Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Low-No-fee Card
1 label81%
2 labels16%
3+ labels3%
• 1.23 labels on average• At most 26 labels
Yahoo! Data Mining & Research
Flat Model
• Flat Model: do not leverage the hierarchical structure– Threshold tuning on training data alone takes 40 hours to finish
while MetaLabeler costs 2 hours.
1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
80
90
100
SVMMetaLabelerThreshold Tuning
Depth
Mic
ro-F
1
Yahoo! Data Mining & Research
Hierarchical Model - Training
Root
. . . . .. . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
Training Data
N
New Training Data
Step 1: Generate Training Data
Step 2: Roll up labels
Step 4: Train One vs. Rest SVMOther
Step 3: Create “Other” Category
Yahoo! Data Mining & Research
Hierarchical Model - Prediction
Root
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
Query q Predict using SVMs trained at root level
Query q
Query q
Stop !!!
• Stop if reaching a leaf node or “other” category
m1
m2
m3
m2 m3m4
c1
c2
c3 Other
Stop !!!
Yahoo! Data Mining & Research
Hierarchical Model + MetaLabeler
• Precision decrease by 1-2%, but recall is improved by 10% at deeper levels.
1 2 3 4 5 6 7 860
65
70
75
80
85
90
95
100
MetaLabeler-PrecisionMetaLabeler-RecallSVM-PrecisionSVM-Recall
Depth
Perfo
rman
ce
Yahoo! Data Mining & Research
Features in MetaLabelerFeature Related Categories
Overstock.com –Mass Merchants/…/discount department stores–Apparel & Jewelry –Electronics & Appliances –Home & Garden–Books-Movies-Music-Tickets
Blizard –Toys & Hobbies/…/Video Game–Computing/…/Computer Game Software–Entertainment & Social Event/…/Fast Food Restaurant–Reference/News/Weather Information
Threading – Books-Movies-Music-Tickets/…/Computing Books– Computing/…/Programming– Health and Beauty/…/Unwanted Hair– Toys and Hobbies/…/Sewing
Yahoo! Data Mining & Research
Conclusions & Future Work
• MetaLabeler is promising for large-scale multi-label classification– Core idea: learn a meta model to predict the number of labels– Simple, efficient and scalable– Use existing SVM software directly– Easy for practical deployment
• Future work– How to optimize MetaLabeler for desired performance ?
• E.g. > 95% precision
– Application to social networking related tasks
Questions?
Yahoo! Data Mining & Research
References
• Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., and Ma, W. 2005. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl. 7, 1 (Jun. 2005), 36-43.
• Rifkin, R. and Klautau, A. 2004. In Defense of One-Vs-All Classification. J. Mach. Learn. Res. 5 (Dec. 2004), 101-141.
• Yang, Y. 2001. A study of thresholding strategies for text categorization. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM, New York, NY, 137-145.