Download - Scalable Training of Mixture Models via Coresets

Scalable Training of Mixture Models via Coresets

Daniel Feldman

MatthewFaulkner

Andreas Krause

MIT

Fitting Mixtures to Massive Data

ImportanceSample

EM, generally expensive Weighted EM, fast!

Coresets for Mixture Models

*

Naïve Uniform Sampling

4

5

Naïve Uniform Sampling

Small cluster is missed

Sample a set U of m points uniformly

High variance

Sampling Distribution

Sampling distribution

Bias sampling towards small clusters

Importance Weights

WeightsSampling distribution

Creating a Sampling Distribution

Iteratively find representative points

8


• Sample a small set uniformly at random

9



• Remove half the blue points nearest the samples• Sample a small set uniformly at random

10




11




12




13




14




15




16

Small clusters are represented



Partition data via a Voronoi diagram centered at points17


Sampling distribution 18

Points in sparse cells get more massand points far from centers

Importance Weights

Sampling distribution 19

Points in sparse cells get more massand points far from centers

Weights

20

Importance Sample

21

Coresets via Adaptive Sampling

A General Coreset Framework

•

•

•

•

Contributions for Mixture Models:

A Geometric PerspectiveGaussian level sets can be expressed purely geometrically:

23

affine subspace

Geometric Reduction

Lifts geometric coreset tools to mixture models

Soft-min

Semi-Spherical Gaussian Mixtures

25

Extensions and Generalizations

26

Level Sets

Composition of Coresets

Merge[c.f. Har-Peled, Mazumdar 04]

27

Composition of Coresets

Compress

Merge[Har-Peled, Mazumdar 04]

28

Coresets on Streams

Compress


29

Coresets on Streams

Compress


30

Coresets on Streams

Compress


31Error grows linearly with number of compressions

Coresets on Streams

Error grows with height of tree

33

Coresets in Parallel

Handwritten DigitsObtain 100-dimensional features from 28x28 pixel images via PCA. Fit GMM with k=10 components.

34

MNIST data:60,000 training,10,000 testing

35

Neural Tetrode RecordingsWaveforms of neural activity at four co-located electrodes in a live rat hippocampus. 4 x 38 samples = 152 dimensions.

T. Siapas et al, Caltech

36

Community Seismic NetworkDetect and monitor earthquakes using smart phones, USB sensors, and cloud computing.

CSN Sensors Worldwide

Learning User Acceleration

37

17-dimensional acceleration feature vectors

Bad

Good

38

Seismic Anomaly Detection

Bad

Good

GMM used for anomaly detection

Conclusions

• Lift geometric coreset tools to the statistical realm - New complexity result for GMM level sets

• Parallel (MapReduce) and Streaming implementations

• Strong empirical performance, enables learning on mobile devices

• GMMs admit coresets of size independent of n - Extensions for other mixture models

39