+ All Categories
Home > Documents > Sufficient Dimensionality Reduction with Irrelevance Statistics

Sufficient Dimensionality Reduction with Irrelevance Statistics

Date post: 09-Jan-2016
Category:
Upload: mauli
View: 26 times
Download: 0 times
Share this document with a friend
Description:
Sufficient Dimensionality Reduction with Irrelevance Statistics. Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School of Computer Science and Engineering. The Hebrew University 2 Robotics Lab, CS department, Stanford University. X. Y. ?. - PowerPoint PPT Presentation
Popular Tags:
26
Sufficient Dimensionality Reduction with Irrelevance Statistics Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School of Computer Science and Engineering. The Hebrew University 2 Robotics Lab, CS department, Stanford University
Transcript
Page 1: Sufficient Dimensionality Reduction with Irrelevance Statistics

Sufficient Dimensionality Reduction with Irrelevance Statistics

Amir Globerson1

Gal Chechik2

Naftali Tishby1

1 Center for Neural Computation and School of Computer Science and Engineering. The Hebrew University2 Robotics Lab, CS department, Stanford University

Page 2: Sufficient Dimensionality Reduction with Irrelevance Statistics

• Goal: Find a simple representation of X which captures its relation to Y

• Q1: Clustering is not always the most appropriate description (e.g. PCA vs Clustering)

• Q2: Data may contain many structures. Not all of them relevant for a given task.

X Y?

Page 3: Sufficient Dimensionality Reduction with Irrelevance Statistics

Talk layout

• Continuous feature extraction using SDR

• Using Irrelevance data in unsupervised learning

• SDR with irrelevance data

• Applications

Page 4: Sufficient Dimensionality Reduction with Irrelevance Statistics

Continuous features

Papers

Terms

Applications

Theory

•What would be a simple representation of the papers ?•Clustering is not a good solution. Continuous scale.•Look at the mean number of words in the following groups:

• figure, performance, improvement, empirical • equation, inequality, integral

• Better look at weighted means (e.g. figure only loosely related to results)• The means give a continuous index reflecting the content of the document

Page 5: Sufficient Dimensionality Reduction with Irrelevance Statistics

Information in Expectations

• Represent p(X|y) via the expected value of some function

• Look at • A set of |Y| values, representing p(X|y)

p(X|y)

<1(X)>p(X|y) <2(X)>p(X|y)

p(x1|y) p(x2|y) p(xn|y)

( | )( ) ( ) ( | )

p X yx

X X p X y

( ) : dX X

Page 6: Sufficient Dimensionality Reduction with Irrelevance Statistics

Examples

• Weighted sum of word counts in a document can be informative about content

• Weighted grey levels in specific image areas may reveal its identity

• Mean expression level can reveal tissue identity

• But what are the best features to use ? • Need a measure of information in expected

values

Page 7: Sufficient Dimensionality Reduction with Irrelevance Statistics

Quantifying information in expectations

• Possible measures ?

I(X;Y) for any 1-1 (x)

Goes to H(Y) as n grows

• Want to extract the information related only to expected values

• Consider all distributions which have the given expected values, and choose the least informative one.

( ( ); )I X Y

1

1( ( ); )n

ii

I x Yn

Page 8: Sufficient Dimensionality Reduction with Irrelevance Statistics

Quantifying information in expectations

• Define the set of distributions which agree with p on the expected values of and marginals:

• We define the information in measuring (x) on p(x,y) as

( , )( , ) min ( )M p P pI p I p

)()(~)()(~)()(

:),(~),()|()|(~

ypyp

xpxp

xx

yxppPyxpyxp

Page 9: Sufficient Dimensionality Reduction with Irrelevance Statistics

Sufficient Dimensionality Reduction (SDR)

• Find (x) which maximizes• Equivalent to finding the maximum likelihood

parameters for

• Can be done using an iterative projection algorithm (GT, JMLR 03)

• Produces useful features for document analysis

• But what if p(x,y) contains many structures ?

( , )MI p

1

1( , ) exp ( ) ( ) ( ) ( )

d

i ii

p x y x y A x B yZ

Page 10: Sufficient Dimensionality Reduction with Irrelevance Statistics

Talk layout

• Feature extraction using SDR

• Irrelevance data in unsupervised learning

• Enhancement of SDR with irrelevance data

• Applications

Page 11: Sufficient Dimensionality Reduction with Irrelevance Statistics

Relevant and Irrelevant Structures

• Data may contain structures we don’t want to learn

For example:•Face recognition: face geometry is important, illumination is not.•Speech recognition: spectral envelope is important, not pitch (English)•Document classification: content is important, style is not.•Gene classification: A given gene may be involved in pathological as well as normal pathways

• Relevance is not absolute, it is task dependent

Page 12: Sufficient Dimensionality Reduction with Irrelevance Statistics

Irrelevance Data

• Data set which contains only irrelevant structures are often available (Chechik and Tishby, NIPS 2002)– Images of one person under different illumination

conditions– Recordings of one word uttered in different

intonations– Document of similar content but different styles– Gene expression patterns from healthy tissues

• Find features which avoid the irrelevant ones

Page 13: Sufficient Dimensionality Reduction with Irrelevance Statistics

Learning with Irrelevance Data

• Given a model of the data f, Q(f,D) is some quantifier of the goodness of feature f on the dataset D (e.g. likelihood, information)

• We want to find maxfQ (f,D+)-Q(f,D-)• Has been demonstrated successfully (CT,2002) for the case where

– F=p(T|X), soft clustering– Q(F,Y)=I(T;Y)

• The principle is general and can be applied to any modeling scheme

Irrelevance data (D-)Main data (D+)

Page 14: Sufficient Dimensionality Reduction with Irrelevance Statistics

Talk layout

• Information in expectations (SDR)

• Irrelevance data in unsupervised learning

• Enhancement of SDR with irrelevance data

• Applications

Page 15: Sufficient Dimensionality Reduction with Irrelevance Statistics

Adding Irrelevance Statistics to SDR

• Using as our goodness of feature quantifier, we can use two distributions, a relevant , and irrelevant

• The optimal feature is then

• For =0 we have SDR

( , )MI p

( , )p X Y ( , )p X Y

*( ) argmax ( )

( ) ( , ) ( , )M M

x L

L I p I p

Page 16: Sufficient Dimensionality Reduction with Irrelevance Statistics

Calculating *(x)

• When =0, an iterative algorithm can be devised (Globerson and Tishby 02)

• Otherwise, the gradient of L() can be calculated and ascended

( | ) ( | ) ( | ) ( | )( ) ( ) ( ) ( ) ( ) ( )

p y x p y x p y x p y x

Lp x y y p x y y

Page 17: Sufficient Dimensionality Reduction with Irrelevance Statistics

Synthetic Example

0 1 D-D+

Page 18: Sufficient Dimensionality Reduction with Irrelevance Statistics

Phase Transitions

(x)

Page 19: Sufficient Dimensionality Reduction with Irrelevance Statistics

Talk layout

• Feature extraction using SDR

• Irrelevance data in unsupervised learning

• SDR with irrelevance data

• Applications

Page 20: Sufficient Dimensionality Reduction with Irrelevance Statistics

Converting Images into Distributions

X

Y

Page 21: Sufficient Dimensionality Reduction with Irrelevance Statistics

Extracting a single feature

• The AR dataset consists of images of 50 men and women at different illuminations and postures

• We took the following distributions:– Relevant : 50 men at two illumination

conditions (right and left)– Irrelevant: 50 women at the same illumination

conditions– Expected features: Discriminate between

men, but not between illuminations

Page 22: Sufficient Dimensionality Reduction with Irrelevance Statistics

Results for a single feature

Page 23: Sufficient Dimensionality Reduction with Irrelevance Statistics

Results for a single feature

Page 24: Sufficient Dimensionality Reduction with Irrelevance Statistics

Face clustering task

• Took 5 men with 26 different postures• Task is to cluster the images according to

their identity• Took 26 images of another man as

irrelevance data• Performed dimensionality reduction using

several methods (PCA,OPCA,CPCA and SDR-IS) and measured precision for the reduced data

Page 25: Sufficient Dimensionality Reduction with Irrelevance Statistics

Precision results

Page 26: Sufficient Dimensionality Reduction with Irrelevance Statistics

Conclusions

• Presented a method for feature extraction based on expected values of X

• Showed how it can be augmented to avoid irrelevant structures

• Future Work– Eliminate dependence on the dimension of Y

via compression constraints– Extend to the multivariate case (graphical

models)


Recommended