Download - Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.

Learning of Pseudo-Metrics. Slide 1

Online and Batch Learning of Pseudo-Metrics

Shai Shalev-Shwartz

Hebrew University, Jerusalem

Joint work with

Yoram Singer, Google Inc.

Andrew Y. Ng, Stanford University


Motivating Example


Our Technique• Map instances into a space in which

distances correspond to labels


Outline

• Distance learning setting

• Large margin for distances

• An online learning algorithm

• Online loss analysis

• A dual version

• Experiments:• Online - document filtering• Batch - handwritten digit recognition


Problem Setting

• Training examples:• two instances• similarity label

• Hypotheses class: Pseudo-metrics

matrix

symmetric positive semi-definite matrix


Large Margin for Pseudo-Metrics

• Sample S is -separated w.r.t. a metric


Batch Formulations.t.

s.t.


Pseudo-metric Online Learning Algorithm (POLA)

For

• Get two instances

• Calculate distance

• Predict

• Get true label and suffer hinge-loss

• Update matrix and threshold

If: we want that

If: we want that


Core Update: Two Projections

• Start with• An example

defines a half-space

• is the projection of onto this half-space

• is the projection of onto the PSD cone

PS

D c

one

All zero loss matrices


Online Learning

• Goal – minimize cumulative loss

• Why Online?• Online processing tasks (e.g. Text Filtering)• Simple to implement• Memory and run-time efficient• Worst-case bounds on the performance• Online to batch conversions


Online Loss Bound

• sequence of examples s.t.

• any fixed matrix and threshold

• Then,

Loss bound does not depend on dimension

Loss suffered by “Complexity” of


Incorporating Kernels

• Matrix A can be written as ,

where

• Therefore:


Online Experiments• Task: Document filtering according to topics

• Dataset: Reuters-21578 • 10,000 documents

• Documents labeled as Relevant and Irrelevant

• A few relevant documents (1% - 10% of entire set)

• Algorithms: • POLA

• 1 Nearest Neighbor (1-NN)

• Perceptron Algorithm

• Perceptron Algorithm with Uneven Margins (PAUM) (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola)


POLA for Document Filtering

• Get a document

• Calculate distance to relevant documents observed so far using current matrix

• Predict: document is relevant iff the distance to the closest relevant document is smaller than the current threshold

• Get true label

• Update matrix and threshold


Document Filtering Results• Each blue point corresponds to one topic

• Y-axis designates the error of POLA

• Points beneath the black diagonal line mean that POLA wins

1-NN error

PO

LA

err

or

Perceptron error

PO

LA

err

or

PAUM error

PO

LA

err

or


Batch Experiments• Task: Handwritten digits recognition

• Dataset: MNIST dataset• 45 binary classification problems (all pairs)

• 10,000 training examples

• 10,000 test examples

• Algorithms: Used k-NN with various metrics:• Pseudo-metric learned by POLA

• Euclidean distance

• Metric induced by Fisher Discriminant Analysis (FDA)

• Metric learned by Relevant Component Analysis (RCA)

(Bar-Hillel, Hertz, Shental, and Weinshall)


MNIST Results

Euclidean distance errorFDA errorRCA error

RCA was applied after using PCA as a pre-processing step

• Each blue point corresponds to one binary classification problem

• Y-axis designates the error of POLA

• Points beneath the black diagonal line mean that POLA wins


Toy problem

A color-coded matrix of Euclidean distances between pairs of images


Metric found by POLA


Mapping found by POLA

• Our Pseudo-metrics:


Mapping found by POLA


Summary and Extensions• An online algorithm for learning pseudo-metrics• Formal properties, good experimental results

Extensions:• Alternative regularization schemes to the

Frobenius norm • “Learning to learn”:

• Learning a metric from one set of classes and apply to another set of related classes


• Hello bye = w ¢ x