+ All Categories
Transcript
Page 1: Scalable Training of Mixture Models via  Coresets

Scalable Training of Mixture Models via Coresets

Daniel Feldman

MatthewFaulkner

Andreas Krause

MIT

Page 2: Scalable Training of Mixture Models via  Coresets

Fitting Mixtures to Massive Data

ImportanceSample

EM, generally expensive Weighted EM, fast!

Page 3: Scalable Training of Mixture Models via  Coresets

Coresets for Mixture Models

*

Page 4: Scalable Training of Mixture Models via  Coresets

Naïve Uniform Sampling

4

Page 5: Scalable Training of Mixture Models via  Coresets

5

Naïve Uniform Sampling

Small cluster is missed

Sample a set U of m points uniformly

High variance

Page 6: Scalable Training of Mixture Models via  Coresets

Sampling Distribution

Sampling distribution

Bias sampling towards small clusters

Page 7: Scalable Training of Mixture Models via  Coresets

Importance Weights

WeightsSampling distribution

Page 8: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

Iteratively find representative points

8

Page 9: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Sample a small set uniformly at random

9

Iteratively find representative points

Page 10: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

10

Iteratively find representative points

Page 11: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

11

Iteratively find representative points

Page 12: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

12

Iteratively find representative points

Page 13: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

13

Iteratively find representative points

Page 14: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

14

Iteratively find representative points

Page 15: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

15

Iteratively find representative points

Page 16: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

16

Small clusters are represented

Iteratively find representative points

Page 17: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

Partition data via a Voronoi diagram centered at points17

Page 18: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

Sampling distribution 18

Points in sparse cells get more massand points far from centers

Page 19: Scalable Training of Mixture Models via  Coresets

Importance Weights

Sampling distribution 19

Points in sparse cells get more massand points far from centers

Weights

Page 20: Scalable Training of Mixture Models via  Coresets

20

Importance Sample

Page 21: Scalable Training of Mixture Models via  Coresets

21

Coresets via Adaptive Sampling

Page 22: Scalable Training of Mixture Models via  Coresets

A General Coreset Framework

Contributions for Mixture Models:

Page 23: Scalable Training of Mixture Models via  Coresets

A Geometric PerspectiveGaussian level sets can be expressed purely geometrically:

23

affine subspace

Page 24: Scalable Training of Mixture Models via  Coresets

Geometric Reduction

Lifts geometric coreset tools to mixture models

Soft-min

Page 25: Scalable Training of Mixture Models via  Coresets

Semi-Spherical Gaussian Mixtures

25

Page 26: Scalable Training of Mixture Models via  Coresets

Extensions and Generalizations

26

Level Sets

Page 27: Scalable Training of Mixture Models via  Coresets

Composition of Coresets

Merge[c.f. Har-Peled, Mazumdar 04]

27

Page 28: Scalable Training of Mixture Models via  Coresets

Composition of Coresets

Compress

Merge[Har-Peled, Mazumdar 04]

28

Page 29: Scalable Training of Mixture Models via  Coresets

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

29

Page 30: Scalable Training of Mixture Models via  Coresets

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

30

Page 31: Scalable Training of Mixture Models via  Coresets

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

31Error grows linearly with number of compressions

Page 32: Scalable Training of Mixture Models via  Coresets

Coresets on Streams

Error grows with height of tree

Page 33: Scalable Training of Mixture Models via  Coresets

33

Coresets in Parallel

Page 34: Scalable Training of Mixture Models via  Coresets

Handwritten DigitsObtain 100-dimensional features from 28x28 pixel images via PCA. Fit GMM with k=10 components.

34

MNIST data:60,000 training,10,000 testing

Page 35: Scalable Training of Mixture Models via  Coresets

35

Neural Tetrode RecordingsWaveforms of neural activity at four co-located electrodes in a live rat hippocampus. 4 x 38 samples = 152 dimensions.

T. Siapas et al, Caltech

Page 36: Scalable Training of Mixture Models via  Coresets

36

Community Seismic NetworkDetect and monitor earthquakes using smart phones, USB sensors, and cloud computing.

CSN Sensors Worldwide

Page 37: Scalable Training of Mixture Models via  Coresets

Learning User Acceleration

37

17-dimensional acceleration feature vectors

Bad

Good

Page 38: Scalable Training of Mixture Models via  Coresets

38

Seismic Anomaly Detection

Bad

Good

GMM used for anomaly detection

Page 39: Scalable Training of Mixture Models via  Coresets

Conclusions

• Lift geometric coreset tools to the statistical realm - New complexity result for GMM level sets

• Parallel (MapReduce) and Streaming implementations

• Strong empirical performance, enables learning on mobile devices

• GMMs admit coresets of size independent of n - Extensions for other mixture models

39


Top Related