Multivariate Information Bottleneck

Post on 10-Jan-2016

32 views 0 download

Tags:

description

Multivariate Information Bottleneck. Nir FriedmanOri Mosenzon Noam Slonim Naftali Tishby Hebrew University. Statistics. Data Analysis. Population. Information Bottleneck. Bachlor’s degree. Some college. Cluster “age” clusters that are predictive of education level?. High school. - PowerPoint PPT Presentation

transcript

.

Multivariate Information Bottleneck

Nir Friedman Ori Mosenzon

Noam Slonim Naftali Tishby

Hebrew University

Data Analysis

Population

Statistics

5 15 25 35 45 55 65 75 80

Age

Information Bottleneck

Cluster “age” clusters that are predictive of education level?

High sc

hool

Bachlo

r’s d

egre

e

PHDNon

e

17192429343944495459646974

Some

colle

ge

Information Bottleneck

Cluster “age” clusters that are predictive of education level?

Also cluster education attained to be predictive of age?

High sc

hool

Bachlo

r’s d

egre

e

PHDNon

e

17192429343944495459646974

Some

colle

ge

Our contribution

Generalize Information Bottleneck:

Generic principle for specifying systems of interacting clusters

Characterization of the solution for these specs

General purpose methods for constructing solutions

Information Bottleneck[Tishby, Peirera & Bialek 99]

A

B P(A,B)

T

B P(T,B)

P(T|A)

Soft clustering

);( ATI);( BTI

A B

T

Minimize: I(T;A) - I(T;B)

CompressionInformation lost about A

Preserved information about B

Tradeoff

Information Bottleneck Reexamined

A B

T

A B

T

Actual Distribution

)|(),( ATPBAP

Input parameters

A B

T

Desired independencies

)|;( TBAInd

G in G out

Example: Symmetric Bottleneck

Simultaneous clustering of both A and B P(TA|A)

P(TB|B)

A

TA

B

TB

G in

A B

TA TB

G out

So that TA captures the information A contain about B

TB captures the information B contain about A

General Principle

Input: P(X1,…,Xn)

G in - Compression Tj clusters values of paj

G out - Desired (conditional) independencies

Goal: Find P(Tj|paj) in G in to “match” G out

X1 X2 Xn…

T1 Tk…

Multi-information

Multi-information

Information random variables jointly contain about each other

Generalizes mutual information

I

])()(),,(

[log),,(1

11

n

nn XPXP

XXPEXXΙ

Graph Projection

Let G be a DAG

Define:

)(min)( QPKLGPKL GQ

P

Distributions consistent with G

All possible distributions

Graph Projection

Let G be a DAG

Define:

)(min)( QPKLGPKL GQ

P

Multi-info as thoughP is consistent with G

Real multi-info

Gn IXXIGPKL ),,()( 1

Proposition:

Multi-information & Bayesian Networks

Proposition:

If P is consistent with G

Then

Define

I

i

iin XPXXP )|(),,( 1 pa

Sum of local interactions

i

iiG XII );( pa

i

iin XIXXI );(),,( 1 pa

Optimizing Criteria

Two goals: Lose info wrt G in

Attain conditional independencies in G out

Optimization objective:

)( outin GPKLIL

Force clusters to compress Minimize violations

of conditional indep. in G out

Additional Interpretation

Using properties of we can rewrite

Thus, we can instead minimize

)(

)(outinin

outin

III

GPKLIL

outin IIL

)( GPKL

Minimize informationin G in

Maximize informationin G out

Minimization Objective - Example

);();();( BABA TTIBTIATIL

A

TA

B

TB

G in

A B

TA TBG out

Symmetric Bottleneck

Recall BA

BABA BAPBTPATPTTP,

),()|()|(),(

Input (fixed)Parameters we

can controlParameters we

can control

Characterization of Solutions

Thm: Minimal point if and only if

)},(Exp{),(

)()|( jj

jj

jjj td

Z

tPtP pa

papa

d(tj,paj) - measure of “distortion” between tj and paj

For example in symmetric bottleneck:))|()|((),( aBBA tTPaTPKLatd

Finding Solutions

How can we find solutions?

Asynchronous update Pick an index j Update P(Tj|paj)

Theorem Asynchronous updates converge to (local) minima

)},(Exp{),(

)()|( jj

jj

jjj td

Z

tPtP pa

papa

Example - 20 newsgroup

20,000 messages from 20 news group [Lang 1995]

A - newsgroup of the message B - word in the message

P(a,b) -

probability that choosing a random position in the corpus would select word b in a message in newsgroup a

We applied symmetric bottleneck on both attributes

20 Newsgroup: Symmetric Bottleneck

N

ewsg

roup

word

20 Newsgroup: Symmetric Bottleneck

alt.atheismrec.autosrec.motorcyclesrec.sport.*sci.medsci.spacesoc.religion.christiantalk.politics.*

comp.*misc.forsalesci.cryptsci.electronics

carturkishgameteamjesusgunhockey…

xfileimageencryptionwindowdosmac…

New

sgro

up

word

P(TD,TW)

20 Newsgroup: Symmetric Bottleneck

New

sgro

up

word

P(TD,TW)

20 Newsgroup: Symmetric Bottleneck

New

sgro

up

word

P(TD,TW)

20 Newsgroup: Symmetric Bottleneck

New

sgro

up

word

P(TD,TW)

20 Newsgroup: Symmetric Bottleneck

New

sgro

up

wordatheistschristianityjesusbiblesinfaith…

alt.atheismsoc.religion.christiantalk.religion.misc

P(TD,TW)

Discussion

General framework: Defines a new family of optimization problems

… and solutions

Future directions: Additional algorithms - agglomerative solutions Relation to generative models Parametric constraints in Gout

Example: Parallel Bottleneck

A B

T1 T2A

T1

B

T2

Gin Gout

)];,();([);();( 212111 BTTITTIBTIATIL

))|()|((

)),|(),|((),(

aBB

BaBA

tTPaTPKL

TtBPTaBPKLatd