Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a...

transcript

Amos Storkey, School of Informatics.

Density Traversal Clusteringand Generative Kernels

a generative framework

for spectral clustering

Amos Storkey, Tom G Griffiths

University of Edinburgh

Amos Storkey, School of Informatics, University of Edinburgh

Attribute Generalisation

Prior work

• Tishby and Slonim• Meila and Shi• Coifman et al• Nadler et al

Example: Transition Matrix

Example: 20 Iterations

Argument

• A priori dependence on data.• No generative model.• Inconsistent with underlying density.

• Clusters are spatial characteristics that are properties of distributions.

• Clusters are only properties of data sets in as much as they inherit the property from the underlying distribution from which the data was generated.

But we do know

• Know diffusion asymptotics, but probabilistic formalism inconsistent with data density:– Finite time-step, infinite data limit equilibrium distribution

does not match data distribution.

Density Traversal Clustering

• Define discrete time, continuous, diffusing Markov chain.

• Definition dependent on some latent distribution.• Call this the Traversal Distribution.

The Markov chain

• Transition with probability

• D(y,x) is Gaussian centred at x, P* is Traversal distribution.

• Here S is given by the solution of

)()(),()(

)()(),()|(

ySyPxyDdyxZ

xSxPxxDxxP

tttttt

),()()(

xyDyPdyxS

Generative procedure

Problems

• Random walk in continuous space• Each step involves many intractable integrals.• Real Bayesians would...• Good prior distributions over distributions is a hard

problem, but need prior for traversal distributions.

• Doing all the integrals is not possible, but...– All integrals are with respect to traversal distribution– Use empirical data proxy– All the integrals now become sample estimates: sums

over the data points.– Everything is computable in the space of data points.– WORKS!: never need to evaluate the probability at a

point, only integrals over regions.

We get…

• Scaled likelihood P(xi | centre xj) / P(xi) = n (AD)ij

– A = WS-1

– W is usual affinity

– S-1 is extra consistency term.

• More generally have out of sample scaled likelihood:– P(x | centre y) / P(x)= n a(x)T (AD-2)b(y)

where a(x) and b(x) are the traversal probabilities to and from x.

Example: Scaled likelihoods

Initial distribution

• Can consider other initial distributions.• Specifically can consider delta functions at mixture

centres.• Variational Bayesian Mixture models…

Number of clusters

• Scaled likelihoods for three cluster problem.

Number of clusters

• Scaled likelihoods for a five cluster problem.

Cluster allocations

Conclusion

• A priori formulation of spectral clustering.• Can be used as any other spectral procedure• But also provides scaled likelihoods – can be

combined with Bayesian procedures.• Variational Bayesian formalism.• Small sample approximation issues.• Better to have a flexible density estimator.

Generative Kernels

• Related to Seeger: Covariance Kernels from Bayesian Generative Models

Gaussian Process over X space

Data is obtained by diffusing in X space using the traversal process...

Density, and corresponding traversal process.

And then local averaging andAdditive noise.

Generative Kernels

• Covariance Kij is

• Again use sample estimates.• Presume measured target is local average.• Just standard basis function derivation of GP.

),() sourced () sourced (),( yxKsyPrxPdxdysrK

Motivation

• Generative model generates clustered data positions.

• Targets diffuse using traversal process.• Target values suffer locality averaging influence:

– Diffused objects locally influence one another’s target values so everyone becomes like their neighbours.

• E.g. Accents.• Can add local measurement noise.

Kernel Clustering

• Use sample estimates again to get kernel

• Can also encorporate a prior over iterations and integrate out.

• For example can use matrix exponential exp(A) instead of (AD).

iDT KAsaArasrK .

1 )()()()(),(

Generating targets for rings data

• Can generate from the model:

• Across cluster covariance is low.

• Within cluster continuity.

The point?

• Density dependence matters in missing data problems.

• Gaussian process: data with missing targets has no influence.

• Density Traversal Kernel: data with missing targets affects kernel, and hence has influence.

Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a...

Documents