Post on 21-Dec-2015
transcript
Amos Storkey, School of Informatics.
Density Traversal Clusteringand Generative Kernels
a generative framework
for spectral clustering
Amos Storkey, Tom G Griffiths
University of Edinburgh
Amos Storkey, School of Informatics, University of Edinburgh
Prior work
• Tishby and Slonim• Meila and Shi• Coifman et al• Nadler et al
Amos Storkey, School of Informatics, University of Edinburgh
Argument
• A priori dependence on data.• No generative model.• Inconsistent with underlying density.
• Clusters are spatial characteristics that are properties of distributions.
• Clusters are only properties of data sets in as much as they inherit the property from the underlying distribution from which the data was generated.
Amos Storkey, School of Informatics, University of Edinburgh
But we do know
• Know diffusion asymptotics, but probabilistic formalism inconsistent with data density:– Finite time-step, infinite data limit equilibrium distribution
does not match data distribution.
Amos Storkey, School of Informatics, University of Edinburgh
Density Traversal Clustering
• Define discrete time, continuous, diffusing Markov chain.
• Definition dependent on some latent distribution.• Call this the Traversal Distribution.
Amos Storkey, School of Informatics, University of Edinburgh
The Markov chain
• Transition with probability
• D(y,x) is Gaussian centred at x, P* is Traversal distribution.
• Here S is given by the solution of
)()(),()(
)(
)()(),()|(
1*1
11
1*1
1
ySyPxyDdyxZ
xZ
xSxPxxDxxP
t
tttttt
)(
),()()(
*
yS
xyDyPdyxS
Amos Storkey, School of Informatics, University of Edinburgh
Problems
• Random walk in continuous space• Each step involves many intractable integrals.• Real Bayesians would...• Good prior distributions over distributions is a hard
problem, but need prior for traversal distributions.
Amos Storkey, School of Informatics, University of Edinburgh
CHEAT
• Doing all the integrals is not possible, but...– All integrals are with respect to traversal distribution– Use empirical data proxy– All the integrals now become sample estimates: sums
over the data points.– Everything is computable in the space of data points.– WORKS!: never need to evaluate the probability at a
point, only integrals over regions.
Amos Storkey, School of Informatics, University of Edinburgh
We get…
• Scaled likelihood P(xi | centre xj) / P(xi) = n (AD)ij
– A = WS-1
– W is usual affinity
– S-1 is extra consistency term.
• More generally have out of sample scaled likelihood:– P(x | centre y) / P(x)= n a(x)T (AD-2)b(y)
where a(x) and b(x) are the traversal probabilities to and from x.
Amos Storkey, School of Informatics, University of Edinburgh
Initial distribution
• Can consider other initial distributions.• Specifically can consider delta functions at mixture
centres.• Variational Bayesian Mixture models…
Amos Storkey, School of Informatics, University of Edinburgh
Number of clusters
• Scaled likelihoods for three cluster problem.
Amos Storkey, School of Informatics, University of Edinburgh
Number of clusters
• Scaled likelihoods for a five cluster problem.
Amos Storkey, School of Informatics, University of Edinburgh
Conclusion
• A priori formulation of spectral clustering.• Can be used as any other spectral procedure• But also provides scaled likelihoods – can be
combined with Bayesian procedures.• Variational Bayesian formalism.• Small sample approximation issues.• Better to have a flexible density estimator.
Amos Storkey, School of Informatics, University of Edinburgh
Generative Kernels
• Related to Seeger: Covariance Kernels from Bayesian Generative Models
Gaussian Process over X space
Data is obtained by diffusing in X space using the traversal process...
Density, and corresponding traversal process.
And then local averaging andAdditive noise.
X
Amos Storkey, School of Informatics, University of Edinburgh
Generative Kernels
• Covariance Kij is
• Again use sample estimates.• Presume measured target is local average.• Just standard basis function derivation of GP.
),() sourced () sourced (),( yxKsyPrxPdxdysrK
Amos Storkey, School of Informatics, University of Edinburgh
Motivation
• Generative model generates clustered data positions.
• Targets diffuse using traversal process.• Target values suffer locality averaging influence:
– Diffused objects locally influence one another’s target values so everyone becomes like their neighbours.
• E.g. Accents.• Can add local measurement noise.
Amos Storkey, School of Informatics, University of Edinburgh
Kernel Clustering
• Use sample estimates again to get kernel
• Can also encorporate a prior over iterations and integrate out.
• For example can use matrix exponential exp(A) instead of (AD).
ij
ijjDT
iDT KAsaArasrK .
1.
1 )()()()(),(
Amos Storkey, School of Informatics, University of Edinburgh
Generating targets for rings data
• Can generate from the model:
• Across cluster covariance is low.
• Within cluster continuity.