+ All Categories
Home > Documents > Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani...

Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani...

Date post: 03-Oct-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
31
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 2004 – p.1
Transcript
Page 1: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Manifold Regularization

Vikas Sindhwani

Department of Computer Science

University of Chicago

Joint Work with Mikhail Belkin and Partha Niyogi

TTI-C Talk September 14, 2004 – p.1

Page 2: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

The Problem of Learning

� � ��� ��� � � ��� is drawn from an unknownprobability distribution ��� �.A Learning Algorithm maps

�to an element� of a hypothesis space of functions

mapping .� should provide good labels for futureexamples.

Regularization : Choose a simple functionthat agrees with data.

TTI-C Talk September 14, 2004 – p.2

Page 3: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

The Problem of Learning

Notions of simplicity are the key to successful

learning. Here’s a simple function that agrees with

data.

TTI-C Talk September 14, 2004 – p.3

Page 4: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Learning and Prior Knowledge

But Simplicity is a Relative Concept. PriorKnowledge of the Marginal can modify ournotions of simplicity.

TTI-C Talk September 14, 2004 – p.4

Page 5: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Motivation

How can we exploit prior knowledge of themarginal distribution �?More practically, how can we use unlabeledexamples drawn from �Why is this important ?

Natural Data has structure to exploit.Natural Learning is largelysemi-supervised.Labels are Expensive, Unlabeled data ischeap and plenty.

TTI-C Talk September 14, 2004 – p.5

Page 6: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Contributions

A data-dependent, Geometric RegularizationFramework for Learning from examples.

Representer Theorems provide solutions.

Extensions of SVM and RLS forSemi-supervised Learning.

Regularized Spectral Clustering andDimensionality Reduction.

The problem of Out-of-sample extensions ingraph methods is resolved.

Good Empirical Performance.TTI-C Talk September 14, 2004 – p.6

Page 7: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Regularization with RKHS

Learning in Reproducing Kernel HilbertSpaces :

� � �� �� ������ � �! "

��� # #� � $ � � � $ % & & '(

Regularized Least Squares (RLS) :# #� � $ � � � $ � # � � ) #� � $ $ '

Support Vector Machine (SVM) :# #� � $ � � � $ � � �* +-, � ! ) � � #� � $.

TTI-C Talk September 14, 2004 – p.7

Page 8: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

What are RKHS ?

Hilbert Spaces with a nice property :If two functions � / 0 are close in thedistance derived from the inner product,their values

#� $ � / #� $are close

1� 0 .

Reproducing Property :23 4 5 #� $is linear, continuous. By

Reisz’s Representation theorem,6 #� �7 $ 0 4 2 3 # $ � 8 � 3 9 � � #� $

.

Kernel Function RKHS :#� � : $ � 3 # : $ � 8 3 � ; 9

TTI-C Talk September 14, 2004 – p.8

Page 9: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Why RKHS ?

Rich Function Spaces with complexity control

e.g Gaussian Kernel

#� � � $ � < = >@?BA C > DDFE D :& � & '( � G H #�I $ G 'KJ #&�L & ' $M IRepresenter Theorems show that theminimizer has the form :� #7 $ � ��� N � #� �O�7 $

and therefore,& � & '( � 8 � � � 9 �P� � �RQ S � N � N S #� �� � S $

Motivates kernelization (KPCA, KFD, etc).

Good empirical performance.

TTI-C Talk September 14, 2004 – p.9

Page 10: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Known Marginal

If � is known, solve :

� � �� �� ������ � �!"

��� # #� � $ � � � $ %UT & & '( %WV & & 'V

Extrinsic and Intrinsic Regularization%XT controls complexity in ambient space.%�V controls complexity in the intrinsicgeometry of �

TTI-C Talk September 14, 2004 – p.10

Page 11: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Continuous Representer Theorem

Assume that the penalty term

& & V is sufficientlysmooth with respect to the RKHS norm

& & ( .Then the solution

to the optimization problemexists and admits the following representation

� #� $ � ��� N � #� �Y� � $ Z N # � $ #� � � $ M � # � $

where � [\ ] ] � �

is the support of the

marginal �.TTI-C Talk September 14, 2004 – p.11

Page 12: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

A Manifold Regularizer

If , the support of the marginal is a compactsubmanifold ^ � _

, it seems natural tochoose :

& & 'V � Z 8 Z � Z 9

and to find

� 0 ( that minimizes :

! "

�� # #� � $ � � � $ %UT & & '( % V Z 8 Z � Z 9

TTI-C Talk September 14, 2004 – p.12

Page 13: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Laplace Beltrami Operator

The intrinsic regularizer is a quadratic forminvolving the Laplace-Beltrami operator on the

manifold

`ba �� )Mc d Z :

& & V � Z 8 Z � Z 9 � Z

because some calculus on manifolds establishesthat for any vector field

# � Z $

,Z 8 � Z 9 � ) Z Mc d # $

TTI-C Talk September 14, 2004 – p.13

Page 14: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Passage to the Discrete

In reality, is unknown and sampled onlyvia examples

��� � BeUf ��� . Labels are not requiredfor empirical estimates of

& & 'V .Manifold Graph

# � $

� ��� � e f �� � � #� � � � S $ 4 � � ghji k � S

.

Laplace Beltrami Graph Laplacian

l

l `ba �� ) � Mc m / � � � � S � S

.& & 'V � Z n& & 'V � o p l o �# #� � $ ) #� S $ $ ' � S

TTI-C Talk September 14, 2004 – p.14

Page 15: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Algorithms

We have motivated the following optimizationproblem : Find a function

� 0 ( thatminimizes :! "

��� # #� � $ � � � $ %qT & & '( %�V# " r $ ' o p l o

Laplacian RLS# #� � $ � � � $ � # � � ) #� � $ $ '

Laplacian SVM# #� � $ � � � $ � � �* +-, � ! ) � � #� � $.

TTI-C Talk September 14, 2004 – p.15

Page 16: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Empirical Representer Theorem

The minimizer admits an expansion

� #� $ � BeUf�� N � #� � � � $

Proof :Write any 0 ( as

BeUf ��� N � #� � � � $ s#� S $ � 8 � 3 k 9 � BeUf ��� N � #� � � � S $

s increases the norm. So

� s � , .

TTI-C Talk September 14, 2004 – p.16

Page 17: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Laplacian RLS

By the Representer Theorem, the problembecomes finite dimensional. For Laplacian RLS,we find N � 0 Be f

that minimizes :! " & ) t N & ' %XT N p N %WV#r " $ ' N p l N

where : Gram Matrix ; � + � �7 7 7 � � � , 7 7 7 � , .

and

t � Mc m / # ! �7 7 7 � ! � , �7 7 7 � , $

. The solution is :

N � � # t %XT "u %WV "#r " $ ' l $ =

TTI-C Talk September 14, 2004 – p.17

Page 18: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Laplacian SVM

For Laplacian SVMs, we solve a QP :� � �� �� �*wv � x y ��� � ) ' p

subject to : ��� � � � � ,, �

where � t #-z % T u z {}|~BeUf � D l $ = t p

, andthen invert a linear system :

N � � #-z %XT u z %WV#r " $ ' l $ = t p�

TTI-C Talk September 14, 2004 – p.18

Page 19: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Manifold Regularization

Input : l labeled and u unlabeled examples

Output : 4 _ 5

Algorithm :Contruct adjacency Graph. ComputeLaplacian.Choose Kernel

#� � � $ . Compute Grammatrix K.Choose % T � %�V . (?)Compute N � .Output

� #� $ � Be f ��� N �� #� �Y� � $

TTI-C Talk September 14, 2004 – p.19

Page 20: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Unity of Learning

Supervised Partially Supervised Unsupervised

SVM/RLS Graph Regularization Graph Mincut�� �� ���X�� ��� � � � � ���@�� � ���� � � � � � ���X�� ��� ��� � � ���� � ���F  � ¡ ¢b£ �¥¤ ¦ ¢b§ � ¨¨B© � � � ���  � ¡ ¢ £ �¥¤ ª�« ¨ ©U¬ ª ­q® ª �¯ � °�� ±   � ² � ± ¢ ª �³ ª ± ¨ ´¬ µ ¦ µ ´·¶ Out-of-sample Extn. Spectral Clustering� � � � ��� �� �� � � � � ����� �� � ´ ª ­ ® ª� � � ���  � ¡ ¢ £ �¥¤ ª�« ¨ ©U¬ ª ­q® ª

Out-of-sample Extn.� � � � ��� �� �� � ´ ª ­¸® ª

Reg. Spectral Clust.�� �� ���-�� ���� ´ ª ­® ª © ¬ µ ¦ µ ´¹¶

TTI-C Talk September 14, 2004 – p.20

Page 21: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Regularized Spectral Clustering

Unsupervised Manifold Regularization :� � �� �� � �º »½¼¿¾ ÀÂÁ > ¼ > ÃÄþ º��� � �% & & '( o p l o

Representer Theorem :

� #� $ � f ��� N �� #� � � � $

leads to an eigenvalue problem :# % l $Å � Æ 'Å

and N � � d � . d � is the smallest-eigenvalueeigenvector; P projects orthogonal to

Ç

.TTI-C Talk September 14, 2004 – p.21

Page 22: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Experiments : Synthetic

−1 0 1 2

−1

0

1

2

γA = 0.03125 γ

I = 0

SVM

−1 0 1 2

−1

0

1

2

Laplacian SVM

γA = 0.03125 γ

I = 0.01

−1 0 1 2

−1

0

1

2

Laplacian SVM

γA = 0.03125 γ

I = 1

−1 0 1 2

−1

0

1

2

γA = 1e−06 γ

I = 1

−1 0 1 2

−1

0

1

2

γA = 0.0001 γ

I = 1

−1 0 1 2

−1

0

1

2

γA = 0.1 γ

I = 1

TTI-C Talk September 14, 2004 – p.22

Page 23: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Related Algorithms

Transductive SVMs [Joachims, Vapnik]

¦ ÈÊÉ � � � � ����� ��� � Ë��Ì �Í Í Í Ë� �� Î ��  � ¢Ï ³ £ � ¦ ¢b§ � ¨ ¨ � © Î È � � °�  � � � ¢Ï ³ £ � ¦ ¢b§ � ¨ ¨ � © µ ¦ µ ´¹¶

Semi-supervised SVMs [Bennet,Fung et al]

¦ ÈÊÉ �� �� ����� �Ð� � Ë�� Ì �Í Í Í Ë��� Î ��  Ñ ¢Ï ³ £ � ¦ ¢ § � ¨ ¨ � ©

C

� � � °�F  � � � � ��� Ò ¢Ï ³ ¦ ¢b§ � ¨ ¨ � ¤ ¢Ï © ¦ ¢ § � ¨ ¨ � Ó © µ ¦ µ ´·¶

Measure-based Reg. [Bousquet et al]¦ ÈÉ �� �� ����� Ô ��  � ¡ ¢ ¦ ¢ § � ¨ ¤ £ � ¨ ©U¬ Õ Ö× ¦ ¢b§ ¨ ¤ × ¦ ¢ § ¨ØbÙ ¢b§ ¨Ú §

TTI-C Talk September 14, 2004 – p.23

Page 24: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Experiments : Synthetic Data

−1 0 1 2−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5SVM

−1 0 1 2−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5Transductive SVM

−1 0 1 2−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5Laplacian SVM

TTI-C Talk September 14, 2004 – p.24

Page 25: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Experiments : Digits

10 20 30 400

5

10

15

20

RLS vs LapRLS

45 Classification Problems

Erro

r Rat

es

RLSLapRLS

10 20 30 400

5

10

15

20

SVM vs LapSVM

45 Classification ProblemsEr

ror R

ates

SVMLapSVM

10 20 30 400

5

10

15

20TSVM vs LapSVM

45 Classification Problems

Erro

r Rat

es

TSVMLapSVM

0 5 10 150

5

10

15Out−of−Sample Extension

LapRLS (Unlabeled)

LapR

LS (T

est)

0 5 10 150

5

10

15Out−of−Sample Extension

LapSVM (Unlabeled)

LapS

VM (T

est)

0 2 4 60

5

10

15Performance Deviation

SVM

(o) ,

TSV

M (x

) Dev

iation

LapSVM Deviation

TTI-C Talk September 14, 2004 – p.25

Page 26: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Experiments : Digits

2 4 8 16 32 64 1280

1

2

3

4

5

6

7

8

9

10SVM vs LapSVM

Number of Labeled Examples

Aver

age

Erro

r Rat

e

SVM (T)SVM (U)LapSVM (T)LapSVM (U)

2 4 8 16 32 64 1280

1

2

3

4

5

6

7

8

9RLS vs LapRLS

Number of Labeled Examples

Aver

age

Erro

r Rat

e

RLS (T)RLS (U)LapRLS (T)LapRLS (U)

TTI-C Talk September 14, 2004 – p.26

Page 27: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Experiments : Speech

0 10 20 30

14

16

18

20

22

24

26

28

Labeled Speaker #

Erro

r Rat

e (u

nlabe

led se

t)

RLS vs LapRLS

RLSLapRLS

0 10 20 30

15

20

25

30

35

40

Labeled Speaker #

Erro

r Rat

es (u

nlabe

led se

t)

SVM vs TSVM vs LapSVM

SVMTSVMLapSVM

0 10 20 30

20

25

30

35

Labeled Speaker #

Erro

r Rat

es (t

est s

et)

RLS vs LapRLS

RLSLapRLS

0 10 20 30

20

25

30

35

40

Labeled Speaker #

Erro

r Rat

es (t

est s

et)

SVM vs TSVM vs LapSVM

SVMTSVMLapSVM

TTI-C Talk September 14, 2004 – p.27

Page 28: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Experiments : Speech

15 20 25 3015

20

25

30

Error Rate (Unlabeled)

Erro

r Rat

e (T

est)

RLS

Experiment 1Experiment 2

15 20 25 3015

20

25

30

Error Rate (Unlabeled)

Erro

r Rat

e (T

est)

LapRLS

Experiment 1Experiment 2

15 20 25 3015

20

25

30

Error Rate (Unlabeled)

Erro

r Rat

e (T

est)

SVM

Experiment 1Experiment 2

15 20 25 3015

20

25

30

Error Rate (Unlabeled)

Erro

r Rat

e (T

est)

LapSVM

Experiment 1Experiment 2

TTI-C Talk September 14, 2004 – p.28

Page 29: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Experiments : Text

Method PRBEP Error

k-NN 73.2 13.3

SGT 86.2 6.2

Naive-Bayes — 12.9

Cotraining — 6.20

SVM 76.39 (5.6) 10.41 (2.5)

TSVM 88.15 (1.0) 5.22 (0.5)

LapSVM 87.73 (2.3) 5.41 (1.0)

RLS 73.49 (6.2) 11.68 (2.7)

LapRLS 86.37 (3.1) 5.99 (1.4)

TTI-C Talk September 14, 2004 – p.29

Page 30: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Experiments : Text

2 4 8 16 32 64

60

65

70

75

80

85

Number of Labeled Examples

PRBE

P

Performance of RLS, LapRLS

2 4 8 16 32 64

60

65

70

75

80

85

Number of Labeled Examples

PRBE

P

Performance of SVM, LapSVM

2 4 8 16 32 64

80

82

84

86

88

Number of Labeled Examples

PRBE

P

LapSVM performance (Unlabeled)

2 4 8 16 32 64

78

80

82

84

86

Number of Labeled Examples

PRBE

P

LapSVM performance (Test)

rls (U)rls (T)laprls (U)laprls (T)

svm (U)svm (T)lapsvm (U)lapsvm (T)

U=779−lU=350U=150

U=779−lU=350U=150

TTI-C Talk September 14, 2004 – p.30

Page 31: Vikas Sindhwani - University of Chicagopeople.cs.uchicago.edu/~vikass/TTItalk.pdfVikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin

Future Work

Generalization as a function of labeled andunlabeled examples.

Additional Structure : Structured Outputs,Invariances

Active Learning , Feature Selection

Efficient Algorithms : Linear Methods, SparseSolutions

Applications : Bioinformatics, Text, Speech,Vision, ...

TTI-C Talk September 14, 2004 – p.31


Recommended