Scalable Training of Mixture Models via Coresets

transcript

Daniel Feldman

MatthewFaulkner

Andreas Krause

Fitting Mixtures to Massive Data

ImportanceSample

EM, generally expensive Weighted EM, fast!

Coresets for Mixture Models

Naïve Uniform Sampling

Small cluster is missed

Sample a set U of m points uniformly

High variance

Sampling Distribution

Sampling distribution

Bias sampling towards small clusters

Importance Weights

WeightsSampling distribution

Creating a Sampling Distribution

Iteratively find representative points

• Sample a small set uniformly at random

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

Small clusters are represented

Partition data via a Voronoi diagram centered at points17

Sampling distribution 18

Points in sparse cells get more massand points far from centers

Importance Weights

Sampling distribution 19

Points in sparse cells get more massand points far from centers

Weights

Importance Sample

Coresets via Adaptive Sampling

A General Coreset Framework

Contributions for Mixture Models:

A Geometric PerspectiveGaussian level sets can be expressed purely geometrically:

affine subspace

Geometric Reduction

Lifts geometric coreset tools to mixture models

Soft-min

Semi-Spherical Gaussian Mixtures

Extensions and Generalizations

Level Sets

Composition of Coresets

Merge[c.f. Har-Peled, Mazumdar 04]

Composition of Coresets

Compress

Merge[Har-Peled, Mazumdar 04]

Coresets on Streams

Compress

Coresets on Streams

Compress

Coresets on Streams

Compress

31Error grows linearly with number of compressions

Coresets on Streams

Error grows with height of tree

Coresets in Parallel

Handwritten DigitsObtain 100-dimensional features from 28x28 pixel images via PCA. Fit GMM with k=10 components.

MNIST data:60,000 training,10,000 testing

Neural Tetrode RecordingsWaveforms of neural activity at four co-located electrodes in a live rat hippocampus. 4 x 38 samples = 152 dimensions.

T. Siapas et al, Caltech

Community Seismic NetworkDetect and monitor earthquakes using smart phones, USB sensors, and cloud computing.

CSN Sensors Worldwide

Learning User Acceleration

17-dimensional acceleration feature vectors

Seismic Anomaly Detection

GMM used for anomaly detection

Conclusions

• Lift geometric coreset tools to the statistical realm - New complexity result for GMM level sets

• Parallel (MapReduce) and Streaming implementations

• Strong empirical performance, enables learning on mobile devices

• GMMs admit coresets of size independent of n - Extensions for other mixture models

Scalable Training of Mixture Models via Coresets

Documents