+ All Categories
Home > Engineering > A two-step method to incorporate task features for large output spaces

A two-step method to incorporate task features for large output spaces

Date post: 15-Apr-2017
Category:
Upload: michiel-stock
View: 798 times
Download: 0 times
Share this document with a friend
27
A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT A two-step method to incorporate task features for large output spaces Michiel Stock 1 , Tapio Pahikkala 2 , Antti Airola 2 , Bernard De Baets 1 & Willem Waegeman 1 1 KERMIT Department of Mathematical Modelling, Statistics and Bioinformatics Ghent University 2 Department of Computer Science University of Turku NIPS: extreme classification workshop December 12, 2015
Transcript

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

A two-step method to incorporate task featuresfor large output spaces

Michiel Stock1, Tapio Pahikkala2, Antti Airola2, BernardDe Baets1 & Willem Waegeman1

1KERMITDepartment of Mathematical Modelling, Statistics and Bioinformatics

Ghent University

2Department of Computer ScienceUniversity of Turku

NIPS: extreme classification workshopDecember 12, 2015

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 4

1 4

4

2 1

4 3

Alice Bob Cedric Daphne

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 4

1 4

4

2 1

4 3

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 2.3 4 3.1

1 4.5 1.3 4

3.9 4 3.8 0.8

2 5.2 1 4.5

4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 2.3 4 3.1

1 4.5 1.3 4

3.9 4 3.8 0.8

2 5.2 1 4.5

4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

4.8 1.1 3.7 2.31 1 0 1

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 2.3 4 3.1

1 4.5 1.3 4

3.9 4 3.8 0.8

2 5.2 1 4.5

4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

4.8 1.1 3.7 2.31 1 0 1

2.3

4.0

1.7

4.8

2.9

Eric

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 2.3 4 3.1

1 4.5 1.3 4

3.9 4 3.8 0.8

2 5.2 1 4.5

4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

4.8 1.1 3.7 2.31 1 0 1

2.3

4.0

1.7

4.8

2.9

Eric

2.4

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning relations

Setting A

Setting B

Setting C

Setting D

Training

In-sampletasks

Out-of-sampletasks

Out-of-sample

instances

In-sampleinstances

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Other cool applications: drug design

Predicting interaction between proteins and small compounds

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Other cool applications: social network analysis

Predicting links between people

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Other cool applications: food pairing

Finding ingredients that pair well

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with pairwise feature representations

Features books

Features readers

� d : instance (e.g. book)

φ(d) : instance features(e.g. genre)

t : task (e.g. reader)

ψ(t) : task features (e.g.social network)

Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with pairwise feature representations

Features books

Features readers

� �⌦

⌦ =

d : instance (e.g. book)

φ(d) : instance features(e.g. genre)

t : task (e.g. reader)

ψ(t) : task features (e.g.social network)

Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with pairwise feature representations

Features books

Features readers

� �⌦

⌦ =

d : instance (e.g. book)

φ(d) : instance features(e.g. genre)

t : task (e.g. reader)

ψ(t) : task features (e.g.social network)

Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning relations in two steps

In-sampletasks

Out-of-sampletasks

Task KRR

InstanceKRR

Virtual instances

In-sampleinstances

Out-of-sample

instances

1 Build a ridgeregression model togeneralize to newinstances

2 Build a ridgeregression model togeneralize to newtasks

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

The two-step ridge regression

Prediction function:

f (d , t) = φ(d)ᵀWψ(t)

Parameters can be found by solving:

ΦᵀYΨ = (ΦᵀΦ + λd I)W(ΨᵀΨ + λtI)

Two hyperparameters: λd and λt !

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

The two-step ridge regression

Prediction function:

f (d , t) = φ(d)ᵀWψ(t)

Parameters can be found by solving:

ΦᵀYΨ = (ΦᵀΦ + λd I)W(ΨᵀΨ + λtI)

Two hyperparameters: λd and λt !

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Four ways of cross validation

Setting A Setting B

Setting DSetting CTrain

Test

Discarded

Analytic shortcutscan be derived toperform LOOCV foreach setting!

Tuning λd and λtessentially free!

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Four ways of cross validation

Setting A Setting B

Setting DSetting CTrain

Test

Discarded

Analytic shortcutscan be derived toperform LOOCV foreach setting!

Tuning λd and λtessentially free!

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Effect of regularization for the four settings

Data: protein-ligand interactions.Evaluation by AUC (lighter = better performance)

lambda drugs

lam

bda t

arg

ets

0.550

0.600

0.6

50

0.7

00

0.7

50

0.8

00

0.800

0.8

00

0.8

50

nr dataCV for Setting A

lambda drugs

lam

bda t

arg

ets

0.5600.6

00

0.6

40

0.6

80

0.7

20

0.7

60

nr dataCV for Setting B

lambda drugs

lam

bda t

arg

ets

0.7

90

0.8

00

0.8

00

0.810

0.8

10

0.8

20

0.8

20

0.8

30

0.8

30

0.8

40

0.8

50

nr dataCV for Setting C

lambda drugs

lam

bda t

arg

ets

0.600

0.6

25

0.625

0.650

0.6

75

0.7

00

0.7

25

nr dataCV for Setting D

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

Clear difference between four settings and λd and λt !

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with mini-batches

Initial training data

New training instances

Even more training instances

TasksIn

stan

ces

New

trai

ning

task

s

Exact updating of theparameters when newtraining instances and/ortaks become available

scalable for “BigData” applications

updating model indynamicenvironment

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with mini-batches

Initial training data

New training instances

Even more training instances

TasksIn

stan

ces

New

trai

ning

task

s Exact updating of theparameters when newtraining instances and/ortaks become available

scalable for “BigData” applications

updating model indynamicenvironment

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Exact online learning for hierarchical textclassification

Hierarchical text classification (> 12, 000 labels): from 5,000to 350,000 instances in steps of 1,000 instances.

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning...in one line of code

Theoretically well founded

Allows for nifty computational tricks

‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning...in one line of code

Theoretically well founded

Allows for nifty computational tricks

‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning...in one line of code

Theoretically well founded

Allows for nifty computational tricks

‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

A two-step method to incorporate task featuresfor large output spaces

Michiel Stock1, Tapio Pahikkala2, Antti Airola2, BernardDe Baets1 & Willem Waegeman1

1KERMITDepartment of Mathematical Modelling, Statistics and Bioinformatics

Ghent University

2Department of Computer ScienceUniversity of Turku

NIPS: extreme classification workshopDecember 12, 2015


Recommended