+ All Categories
Home > Documents > Variational Inference for the Indian Buffet Process

Variational Inference for the Indian Buffet Process

Date post: 14-Feb-2016
Category:
Upload: owen
View: 55 times
Download: 1 times
Share this document with a friend
Description:
Variational Inference for the Indian Buffet Process. Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael and Yee Whye Teh AISTATS 2009 Presented by: John Paisley, Duke University, Dept. of ECE. Introduction. - PowerPoint PPT Presentation
8
Variational Inference for the Indian Buffet Process Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael and Yee Whye Teh AISTATS 2009 Presented by: John Paisley, Duke University, Dept. of ECE
Transcript
Page 1: Variational Inference for the Indian Buffet Process

Variational Inference for the Indian Buffet Process

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael and Yee Whye Teh

AISTATS 2009

Presented by: John Paisley, Duke University, Dept. of ECE

Page 2: Variational Inference for the Indian Buffet Process

Introduction• This paper provides variational inference equations for the stick-

breaking construction of the Indian buffet process (IBP). In addition, bounds are given on truncated stick-breaking approximations of the IBP to the infinite stick-breaking IBP.

• Outline of Presentation– Review of IBP and stick-breaking construction– Variational inference for the IBP– Truncation error bounds for variational inference– Results on a linear-Gaussian model for toy and real data

Page 3: Variational Inference for the Indian Buffet Process

Indian Buffet Process

1. First customer selects features

2. The ith customer selects feature k with probability , fraction of all customers selecting this feature.

3. The ith customer then selects new features.

Below is the probability of the binary matrix Z. The top term is the probability of K dishes, bottom is for permutation.

Page 4: Variational Inference for the Indian Buffet Process

The Stick-Breaking Construction of the IBP

• Rather than marginalizing out ~ , being the probability of selecting a dish, a stick-breaking construction can be used.* (Note: The above generative process is written by the presenter. The probability values are presented in the paper in decreasing order as below)

• This stick-breaking representation is for this specific parameterization of the beta distribution.

~

* Y.W. The, D. Gorur & Z. Ghahramani (2007). Stick-breaking construction for the Indian buffet process. 11 th AISTAT.

Page 5: Variational Inference for the Indian Buffet Process

VB Inference for the Stick-Breaking Construction

Focus on inference for the parameters

A lower bound approximation needs to be made for one of the terms. This is given at right, where the authors introduce a multinomial distribution, q, and optimize for this parameter (lower right).

This is for the likelihood of z, the posterior of v is more complicated. Using this multinomial lower bound, “terms decompose independently for each vm and we get a closed form exponential family update.”

Page 6: Variational Inference for the Indian Buffet Process

Truncation Error for VB Inference

After deriving approximations, an upper bound is,

At right is a comparison of this bound with an estimation of this value using 1000 Monte Carlo simulations for N = 30, \alpha = 5.

Given a truncation of the stick-breaking construction at level K, how close are we to the infinite model?

A bound is given using the same motivation as Ishwaran & James* in their calculation for the Dirichlet process.

* H. Ishwaran & L.F. James (2001). Gibbs sampling methods for stick-breaking priors. JASA.

Page 7: Variational Inference for the Indian Buffet Process

Results: Synthetic Data(lower left) Randomly generated data and calculated the log-likelihoods of test data using the inferred models as a function of time. This indicates that variational inference is both better and faster.

(right) More information about speed for toy problem.

Page 8: Variational Inference for the Indian Buffet Process

Results: Two Real Datasets1. Yale Faces: 721, 32 x 32 images of 14 people

with different expressions and lighting.

2. Speech Data: 245 observations from 10 microphones and 5 speakers

At right, we can see that the variational inference methods outperforms and is faster than Gibbs sampling for the Yale Faces

Performance and speed is worse for the speech dataset. A reason is that the dataset is only 10 dimensional, while Yale is 1032-D. In this small dimensional case, inference is fast for MCMC and the VB approximation becomes apparent.


Recommended