Scalable Training of Mixture Models via Coresets

Post on 24-Feb-2016

54 views 0 download

Tags:

description

Scalable Training of Mixture Models via Coresets. Daniel Feldman. Matthew Faulkner. Andreas Krause. MIT. Fitting Mixtures to Massive Data. EM, generally expensive. Weighted EM, fast!. Importance Sample. Coresets for Mixture Models. *. Naïve Uniform Sampling. - PowerPoint PPT Presentation

transcript

Scalable Training of Mixture Models via Coresets

Daniel Feldman

MatthewFaulkner

Andreas Krause

MIT

Fitting Mixtures to Massive Data

ImportanceSample

EM, generally expensive Weighted EM, fast!

Coresets for Mixture Models

*

Naïve Uniform Sampling

4

5

Naïve Uniform Sampling

Small cluster is missed

Sample a set U of m points uniformly

High variance

Sampling Distribution

Sampling distribution

Bias sampling towards small clusters

Importance Weights

WeightsSampling distribution

Creating a Sampling Distribution

Iteratively find representative points

8

Creating a Sampling Distribution

• Sample a small set uniformly at random

9

Iteratively find representative points

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

10

Iteratively find representative points

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

11

Iteratively find representative points

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

12

Iteratively find representative points

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

13

Iteratively find representative points

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

14

Iteratively find representative points

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

15

Iteratively find representative points

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

16

Small clusters are represented

Iteratively find representative points

Creating a Sampling Distribution

Partition data via a Voronoi diagram centered at points17

Creating a Sampling Distribution

Sampling distribution 18

Points in sparse cells get more massand points far from centers

Importance Weights

Sampling distribution 19

Points in sparse cells get more massand points far from centers

Weights

20

Importance Sample

21

Coresets via Adaptive Sampling

A General Coreset Framework

Contributions for Mixture Models:

A Geometric PerspectiveGaussian level sets can be expressed purely geometrically:

23

affine subspace

Geometric Reduction

Lifts geometric coreset tools to mixture models

Soft-min

Semi-Spherical Gaussian Mixtures

25

Extensions and Generalizations

26

Level Sets

Composition of Coresets

Merge[c.f. Har-Peled, Mazumdar 04]

27

Composition of Coresets

Compress

Merge[Har-Peled, Mazumdar 04]

28

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

29

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

30

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

31Error grows linearly with number of compressions

Coresets on Streams

Error grows with height of tree

33

Coresets in Parallel

Handwritten DigitsObtain 100-dimensional features from 28x28 pixel images via PCA. Fit GMM with k=10 components.

34

MNIST data:60,000 training,10,000 testing

35

Neural Tetrode RecordingsWaveforms of neural activity at four co-located electrodes in a live rat hippocampus. 4 x 38 samples = 152 dimensions.

T. Siapas et al, Caltech

36

Community Seismic NetworkDetect and monitor earthquakes using smart phones, USB sensors, and cloud computing.

CSN Sensors Worldwide

Learning User Acceleration

37

17-dimensional acceleration feature vectors

Bad

Good

38

Seismic Anomaly Detection

Bad

Good

GMM used for anomaly detection

Conclusions

• Lift geometric coreset tools to the statistical realm - New complexity result for GMM level sets

• Parallel (MapReduce) and Streaming implementations

• Strong empirical performance, enables learning on mobile devices

• GMMs admit coresets of size independent of n - Extensions for other mixture models

39