Download - 1 Unsupervised and Transfer Learning Challenge Unsupervised and Transfer Learning Challenge Isabelle Guyon Clopinet, California.

Unsupervised and Transfer Learning Challenge http://clopinet.com/ul

1

Unsupervised and Transfer Learning Challenge

Isabelle Guyon

Clopinet, California

IJCNN 2011San Jose, CaliforniaJul. 31, Aug. 5, 2011


2

Challenge protocol and implemetation:Web platform: Server made available by Prof. Joachim Buhmann, ETH Zurich, Switzerland. Computer admin.: Peter Schueffler. Webmaster: Olivier Guyon, MisterP.net, France.

Protocol review and advising: • David W. Aha, Naval Research Laboratory, USA.• Gideon Dror, Academic College of Tel-Aviv Yaffo, Israel.• Vincent Lemaire, Orange Research Labs, France.• Gavin Cawley, University of east Anglia, UK.• Olivier Chapelle, Yahoo!, California, USA.• Gerard Rinkus, Brandeis University, USA.• Ulrike von Luxburg, MPI, Germany.• David Grangier, NEC Labs, USA.• Andrew Ng, Stanford Univ., Palo Alto, California, USA• Graham Taylor, NYU, New-York. USA.• Quoc V. Le, Stanford University, USA.• Yann LeCun, NYU. New-York, USA.• Danny Silver, Acadia Univ., Canada.

Beta testing and baseline methods:• Gideon Dror, Academic College of Tel-Aviv Yaffo, Israel.• Vincent Lemaire, Orange Research Labs, France.• Gregoire Montavon, TU Berlin, Germany.

Data donors:Handwriting recognition (AVICENNA) -- Reza Farrahi Moghaddam, Mathias Adankon, Kostyantyn Filonenko, Robert Wisnovsky, and Mohamed Chériet (Ecole de technologie supérieure de Montréal, Quebec) contributed the dataset of Arabic manuscripts. The toy example (ULE) is the MNIST handwritten digit database made available by Yann LeCun and Corinna Costes.Object recognition (RITA) -- Antonio Torralba, Rob Fergus, and William T. Freeman, collected and made available publicly the 80 million tiny image dataset. Vinod Nair and Geoffrey Hinton collected and made available publicly the CIFAR datasets. See the techreport Learning Multiple Layers of Features from Tiny Images, by Alex Krizhevsky, 2009, for details.Human action recognition (HARRY) -- Ivan Laptev and Barbara Caputo collected and made publicly available the KTH human action recognition datasets. Marcin Marszałek, Ivan Laptev and Cordelia Schmid collected and made publicly available the Hollywood 2 dataset of human actions and scenes. Text processing (TERRY) -- David Lewis formatted and made publicly available the RCV1-v2 Text Categorization Test Collection.Ecology (SYLVESTER) -- Jock A. Blackard, Denis J. Dean, and Charles W. Anderson of the US Forest Service, USA, collected and made available the (Forest cover type) dataset.

Credits


3

What is the problem?


4

Can learning about...


5

help us learn about…


6

What is Transfer Learning?


7

Vocabulary

Targettask

labels

Sourcetask

labels


8

Vocabulary

Targettask

labels

Sourcetask

labels


9

Vocabulary

Targettask

labels

Sourcetask

labels

Domains the same?

Labels available?

Tasks the same?


10

Taxonomy of transfer learning

Adapted from: A survey on transfer learning, Pan-Yang, 2010.

TransferLearning

Unsupervised TL

Semi-supervised TL

Inductive TL

No labels in both source and target domains

Labels avail. ONLY in source domain

Labels available in target domain

No labels in source domain

Labels available in source domain

Transductive TL

Cross-task TL

Same source and target task

Different source and target tasks

Self-taught TL

Multi-task TL


11

Challenge setting


12

Challenge setting

Adapted from: A survey on transfer learning, Pan-Yang, 2010.

TransferLearning

Unsupervised TL

Semi-supervised TL

Inductive TL

No labels in both source and target domains

Labels avail. ONLY in source domain

Labels available in target domain

No labels in source domain

Labels available in source domain

Transductive TL

Cross-task TL

Same source and target task

Different source and target tasks

Self-taught TL

Multi-task TL


13

Dec 2010-April 2011

http://clopinet.com/ul •Goal: Learning data representations or kernels.•Phase 1: Unsupervised learning (Dec 25, 2010-Mar 3, 2011)•Phase 2: Cross-task transfer learning (Mar. 4, 2011-Apr. 15, 2011)•Prizes: $6000 + free registrations + travel awards• Dissemination: ICML and IJCNN. Proceedings in JMLR W&CP.

Evaluators Challenge target task

labels

Challengedata

Validationdata

Development data

Validation target task

labels

Competitors

Data represen-tations


14

Dec 2010-April 2011

http://clopinet.com/ul •Goal: Learning data representations or kernels.•Phase 1: Unsupervised learning (Dec 25, 2010-Mar 3, 2011)•Phase 2: Cross-task transfer learning (Mar. 4, 2011-Apr. 15, 2011)•Prizes: $6000 + free registrations + travel awards• Dissemination: ICML and IJCNN. Proceedings in JMLR W&CP.

Evaluators Challenge target task

labels

Challengedata

Validationdata

Development data

Validation target task

labels

Sourcetask

labels

Competitors

Data represen-tations


15

Datasets of the challenge


16

Evaluation


17

AUC score

For each set of samples queried, we assess the predictions of the learning machine with the Area under the ROC curve.


18

Area under the Learning Curve (ALC)

Linear interpolation. Horizontal extrapolation.


19

Classifier used

• Linear discriminant:

f(x) = w . x = i wi xi

• Hebbian learning:X = (p, N) training data matrixY {–1/p– , +1/p+}p target vectorw = X’ Y

= (1/p+)kpos xk –(1/p–) kneg xk


20

Kernel version

• Kernel classifier:

f(x) = k k k(xk , x)

with a linear kernel k(xk , x) = xk . x

and with k = –1/p– , if xk neg

k = +1/p+ , if xk pos

• Equivalent linear discriminant

f(x) = (1/p+)kpos xk . x – (1/p–) kneg xk . x

= w . x

with w = (1/p+)kpos xk – (1/p–) kneg xk


21

Methods used


22

No learning

1)

PValidationdata

Task labelsC predictionPre-

processeddata

Challenge platform


23

No learning

1)

PValidationdata


processeddata

Select the best preprocessing based on performance on the validation tasks


24

No learning

1)

P


25

No learning

2)

PChallengedata


processeddata

Use the same preprocessor for the final evaluation


26

Unsupervised transfer learning

P RSourcedomain

1)

Simultaneously train a preprocessor P and a re-constructor Rusing unlabeled data


27


P

1)


28


PTargetdomain

2)

Task labelsC John

Use the same preprocessor for the evaluationon target domains


29

Supervised data representation learning

Source task labelsP CSource

domainSea

1)

Simultaneously train a preprocessor P and a classifier Cwith labeled source domain data


30

P

1)

Superviseddata representation learning



31

PTargetdomain

2)

Task labelsC John

Use the same preprocessor for the evaluation on target domains



32

Variants

• Use all or subsets of data for training (development/validation/challenge data).

• Learn what preprocessing steps to apply w. validation data (not the preprocessor) then apply the method to challenge data.

• Learn to reconstruct noisy versions of the data.

• Train a kernel instead of a preprocessor.


33

Results


34

Questions

• Can Transfer Learning beat raw data (or simple preprocessing)?

• Does Deep Learning work?• Do labels help (does cross-task TL beat

unsupervised TL)?• Is model selection possible in TL?• Did consistent TL methodologies emerge?• Do the results make sense?• Is there code available?


35

Can transfer learning beat raw data?

Phase 1(6933 jobs submitted,

41 complete final entries)

Phase 2(1141 jobs sumitted,

14 complete final entries)


36

Results (ALC)


37

Does “Deep Learning” work?

Evolution of performance as a function of depth on SYLVESTERLISA team, 1st in phase 2, 4th in phase 1


38

Do labels help in TL?


39

Is model selection possible?

Phase 1

Phase 2

Use of “transfer labels”: the -criterion (LISA team)


40

Did consistent methodologies emerge?


41

Results (ALC)


42

Bottom layers:Preprocessing and feature selection


43

Middle layers


44

Top layer


45

Implementation


46

A few things that worked well

• Learn the preprocessing steps (not the preprocessor) – Aiolli, 1st phase 1.

• As 1st steps: eliminate low info features or keep largest PC and sphere the data, normalize, and/or standardize.

• Learn denoising or contrastive auto-encoders or RBMs– LISA team, 1st phase 2.

• Use cluster memberships of multiple K-means – 1055A team, 2nd phase 1 and 3rd phase 2.

• Transductive PCA (as last step) – LISA.


47

Conclusion

• UL: This challenge demonstrated the potential of unsupervised learning methods used as preprocessing to supervised learning tasks.• UTL: Model selection of UL hyper-parameters can be carried out with “source tasks” similar to the “target tasks”.• DL: Multi-step preprocessing leading to deep architectures can be trained in a greedy bottom-up step-wise manner.• Favorite methods include normalizations, PCA, clustering,

and auto-encoders.• A kernel method won phase 1 and a Deep Learning method won phase 2.


48

STEP 1: Develop a “generic” gesture recognition system that can learn new signs with a few examples.

STEP 2: At conference: teach the system new signs.

STEP 3: Live evaluation in front of audience.

June 2011-June 2012

http://gesture.chalearn.org

Challenge