Sketched Learning from Random Features Moments
Nicolas Keriven
Ecole Normale Supérieure (Paris)
CFM-ENS chair in Data Science
(thesis with Rémi Gribonval at Inria Rennes)
Imaging in Paris, Apr. 5th 2018
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Database Task
= cat
Learning
1/21
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Large database
Learning
2/21
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Large database
Slow, costly
Learning
2/21
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Distributed database
Large database
Slow, costly
Learning
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Distributed database
Large database
Slow, costly
Learning
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Small intermediaterepresentation
Distributed database
Large database
Idea!
Slow, costly
Learning
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Small intermediaterepresentation
Distributed database
Large database
Idea!
Slow, costly
Learning
1: Compression
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Small intermediaterepresentation
Distributed database
Large database
Idea!
Slow, costly
Learning
1: Compression
2: Learning
2/21
Data Stream
……
- Clustering
- Classification
- etc…
Context: machine learning
Nicolas Keriven
Task
= cat
Small intermediaterepresentation
Distributed database
Large database
Idea!
Desired properties- Fast to compute (distributed, streaming, GPU…)- Preserve desired information- Preserve data privacy
Slow, costly
Learning
1: Compression
2: Learning
2/21
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .
Database
3/21
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .Compression ?
Database
3/21
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .
. . .
Dimensionality reductionSee eg [Calderbank 2009,
Boutsidis 2010]
- Random Projection- Feature selection
Compression ?
Database
3/21
SubsamplingcoresetsSee eg[Feldman 2010]
- Uniform sampling (naive)- Adaptive sampling…
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .
. . .
Dimensionality reductionSee eg [Calderbank 2009,
Boutsidis 2010]
- Random Projection- Feature selection
Compression ?
. . .
Database
3/21
Linear sketchSee [Thaper 2002][Cormode 2011]
- Hash tables, histograms- Sketching for learning ?
SubsamplingcoresetsSee eg[Feldman 2010]
- Uniform sampling (naive)- Adaptive sampling…
Three compression schemes
Nicolas Keriven
Data = Collection of vectors
Featureextraction
. . .
. . .
Dimensionality reductionSee eg [Calderbank 2009,
Boutsidis 2010]
- Random Projection- Feature selection
Compression ?
. . . Distributed,streaming
Database
3/21
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
Intuition: sketching as a linear embedding
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
- Assumption:
Intuition: sketching as a linear embedding
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
- Assumption:
- Linear operator:
Intuition: sketching as a linear embedding
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
- Assumption:
- Linear operator:
- « Noisy » linear measurement:
Noise small
Intuition: sketching as a linear embedding
How-to: build a sketch
Nicolas Keriven
What is a sketch ?
Any linear sketch = empirical moments
4/21
What is contained in a sketch ?
• : mean
• : moment
• : histogram
• Proposed: kernel random features [Rahimi 2007]
(random proj. + non-linearity)
Questions:
• What information is preserved by the sketching ?
• How to retrieve this information ?
• What is a sufficient number of features ?
- Assumption:
- Linear operator:
- « Noisy » linear measurement:
Noise small
Intuition: sketching as a linear embedding
Dimensionality-reducing, random, linear embedding: Compressive Sensing?
Sketched learning in this talk
Compressive Sensing: sparsity ?
Nicolas Keriven 5/21
Compressive Sensing: Classical compressive sensing
Sketched learning in this talk
Compressive Sensing: sparsity ?
Nicolas Keriven 5/21
Compressive Sensing:
• Dimensionality reduction, random operator
Classical compressive sensing
Randommatrix
Randomfeatures
averaged
Sketched learning in this talk
Compressive Sensing: sparsity ?
Nicolas Keriven 5/21
Compressive Sensing:
• Dimensionality reduction, random operator
• (Ill-posed) inverse problem: density estimation
Classical compressive sensing
Randommatrix
Randomfeatures
averaged
Sketched learning in this talk
Compressive Sensing: sparsity ?
Nicolas Keriven 5/21
Compressive Sensing:
• Dimensionality reduction, random operator
• (Ill-posed) inverse problem: density estimation
• Sparsity: « simple » densities (mixture model)
Classical compressive sensing
Randommatrix
Randomfeatures
averaged
Mixture of Diracs = k-means
Result: Compressive k-means [Keriven et al 2017]
Nicolas Keriven 6/21
Mixture of Diracs = k-means
Result: Compressive k-means [Keriven et al 2017]
Nicolas Keriven
Application: Spectral clusteringfor MNIST classification [Uw 2001]
Classif. Perf.
6/21
- Twice faster than k-means- 4 orders of magnitude more
memory efficient
GMM
Gaussian mixture models
Nicolas Keriven 7/21
GMM
Gaussian mixture models
Nicolas Keriven
d = 10, k = 20
Size of database
Error
7/21
GMM
Gaussian mixture models
Nicolas Keriven
d = 10, k = 20
Size of database
Error
Faster than EM(VLFeat’s gmm)
7/21
GMM
Gaussian mixture models
Nicolas Keriven
d = 10, k = 20
Size of database
Error
Application: speaker verification [Reynolds 2000] (d=12, k=64)
• EM on 300 000 vectors : 29.53• 20kB sketch computed on 50GB database: 28.96
Faster than EM(VLFeat’s gmm)
7/21
In this talk
Nicolas Keriven12/10/2017
Q: Theoretical guarantees ?
• Inspired by Compressive Sensing:
• 1: with the Restricted Isometry Property (RIP)
• 2: with dual certificates
8/21
Outline
Nicolas Keriven
Information-preservation guarantees: a RIP analysis
Total variation regularization:a dual certificate analysis
Conclusion, outlooks
Outline
Nicolas Keriven
Information-preservation guarantees: a RIP analysis
Joint work with R. Gribonval, G. Blanchard, Y. Traonmilin
Total variation regularization:a dual certificate analysis
Conclusion, outlooks
Recall: Linear inverse problem
Nicolas Keriven 9/21
True distribution:
Recall: Linear inverse problem
Nicolas Keriven 9/21
Sketch:
True distribution:
Recall: Linear inverse problem
Nicolas Keriven
• Estimation problem = linear inverse problem on measures
• Extremely ill-posed !
9/21
Sketch:
True distribution:
Recall: Linear inverse problem
Nicolas Keriven
• Estimation problem = linear inverse problem on measures
• Extremely ill-posed !
• Feasibility? (information-preservation)
9/21
Best algorithmpossible
Sketch:
Information preservation guarantees
Nicolas Keriven 10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven 10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven 10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven 10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven
GoalProve the existence of a decoder robustto noise and stable to modeling error.
« Instance-optimal » decoder
10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven
GoalProve the existence of a decoder robustto noise and stable to modeling error.
Lower Restricted Isometry Property
« Instance-optimal » decoder
10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven
Non-convex generalized moment matching
GoalProve the existence of a decoder robustto noise and stable to modeling error.
Lower Restricted Isometry Property
« Instance-optimal » decoder
10/21
: Model set of « simple » distributions (eg. GMMs)
Information preservation guarantees
Nicolas Keriven
New goal: find/construct models and operators that satisfy the LRIP (w.h.p.)
Non-convex generalized moment matching
GoalProve the existence of a decoder robustto noise and stable to modeling error.
Lower Restricted Isometry Property
« Instance-optimal » decoder
10/21
Appropriate metric
Nicolas Keriven
Goal: LRIP
11/21
Appropriate metric
Nicolas Keriven
Reproducing kernel:
Goal: LRIP
11/21
Appropriate metric
Nicolas Keriven
Kernel mean
Reproducing kernel:
Goal: LRIP
11/21
Appropriate metric
Nicolas Keriven
Kernel mean
Reproducing kernel:
Goal: LRIP
11/21
: random features [Rahimi2007]
to approximate
Appropriate metric
Nicolas Keriven
Kernel mean
Reproducing kernel:
Goal: LRIP
11/21
: random features [Rahimi2007]
to approximate
Basis for LRIP
Proof strategy (1)
Nicolas Keriven
Reformulation of the LRIP
Goal: LRIP
12/21
Proof strategy (1)
Nicolas Keriven
Reformulation of the LRIP
Goal: LRIP
12/21
Proof strategy (1)
Nicolas Keriven
Definition: Normalized Secant set
Reformulation of the LRIP
Goal: LRIP
12/21
Proof strategy (1)
Nicolas Keriven
Definition: Normalized Secant set
New goal
With high probability on :
for all , .
Reformulation of the LRIP
Goal: LRIP
12/21
Proof strategy (2)
Nicolas Keriven
Goal: LRIP
13/21
Proof strategy (2)
Nicolas Keriven
Pointwise LRIP:Concentration inequality
Goal: LRIP
13/21
Proof strategy (2)
Nicolas Keriven
Pointwise LRIP:Concentration inequality
Goal: LRIP
Extension to LRIP:covering numbers
13/21
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
14/21
Result
For ,
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
14/21
Result
For ,
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
Quality of pointwise LRIP Dimensionality of the model
14/21
Result
For ,
W.h.p.
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
Quality of pointwise LRIP Dimensionality of the model
14/21
Result
For ,
W.h.p.
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
Quality of pointwise LRIP Dimensionality of the model
Modeling error Empirical noise
14/21
Result
For ,
W.h.p.
Main result
Nicolas Keriven
Main hypothesis
The normalized secant set has finite covering numbers.
Quality of pointwise LRIP Dimensionality of the model
Modeling error
- Classic Compressive Sensing: finite dimension: Known- Here: infinite dimension: Technical
Empirical noise
14/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
Hypotheses- Sufficiently separated means- Bounded domain for means
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
Hypotheses- Sufficiently separated means- Bounded domain for means
Sketch- Fourier features
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
Hypotheses- Sufficiently separated means- Bounded domain for means
Sketch- Fourier features
Result- With respect to log-likelihood
(no assumptionon the data)
15/21
Application
Nicolas Keriven
k-means with mixtures of Diracs GMM with known covariance
Hypotheses- - separated centroids- - bounded domain for centroids
Sketch- Adjusted Random Fourier features (for
technical reasons)
Result- W.r.t. k-means usual cost (SSE)
Sketch size
Hypotheses- Sufficiently separated means- Bounded domain for means
Sketch- Fourier features
Result- With respect to log-likelihood
Sketch size
(no assumptionon the data)
15/21
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
• Random, dimensionality-reducing operator
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
• Random, dimensionality-reducing operator
• Sparsity
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
• Random, dimensionality-reducing operator
• Sparsity
• The information is preserved
Summary
Nicolas Keriven12/10/2017
With the RIP analysis:
• Moment matching: best decoder possible (instance optimal)• Information-preservation guarantees
• Fine control on modeling error, noise, and metrics• Can incorporate k-means cost or log-likelihood
Compressive Sensing:
• Random, dimensionality-reducing operator
• Sparsity
• The information is preserved
• Convex relaxation?
Outline
Nicolas Keriven
Information-preservation guarantees: a RIP analysis
Total variation regularization:a dual certificate analysisJoint work with C. Poon, G. Peyré
Conclusion, outlooks
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
Minimization: moment matching
16/21
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
• Must know
• Non-convex !
Minimization: moment matching
16/21
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
• Must know
• Non-convex !
Minimization: moment matching
Convex relaxation (« super resolution »)
• : Radon measure
•
• : Total variation (« L1 norm »)
16/21
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
• Must know
• Non-convex !
Minimization: moment matching
Convex relaxation (« super resolution »)
• : Radon measure
•
• : Total variation (« L1 norm »)
Convex:• can be handled by eg Frank-Wolfe algorithm
[Boyd 2015], or in some cases as a SDP
16/21
Total Variation regularization
Nicolas Keriven12/10/2017
Previously: RIP analysis
• Must know
• Non-convex !
Minimization: moment matching
Convex relaxation (« super resolution »)
• : Radon measure
•
• : Total variation (« L1 norm »)
Convex:• can be handled by eg Frank-Wolfe algorithm
[Boyd 2015], or in some cases as a SDP
Questions:• Is the measure sparse ?
• Does it have the right number of components ?
• Does it recover the true ?
16/21
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
17/21
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)
17/21
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)
What is a dual certificate?
17/21
Such that:
•
• otherwise•
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)
What is a dual certificate?
17/21
Such that:
•
• otherwise•
A bit of convex analysis
Nicolas Keriven12/10/2017
Intuition: first order conditions: solution
Def. : Dual certificate ( = Lagrange multiplier in the noiseless case…)
What is a dual certificate?
Ensures uniqueness and robustness…
17/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Assumptions:• Kernel « well-behaved »• sufficiently separated
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
• Pointwise deviation (concentration ineq.)• Covering numbers
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
• Pointwise deviation (concentration ineq.)• Covering numbers
m=10
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
• Pointwise deviation (concentration ineq.)• Covering numbers
m=10 m=20
18/21
Strategy: going back to random features
Nicolas Keriven12/10/2017
Step 1: study full kernel
Step 2: bounding the deviations
Assumptions:• Kernel « well-behaved »• sufficiently separated
• Pointwise deviation (concentration ineq.)• Covering numbers
m=50m=10 m=20
18/21
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
In progress…
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
In progress…
• not necessarily sparse, but:
• Mass of concentrated around true
• Proof: infinite-dimensional golfingscheme (new)
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
In progress…
• not necessarily sparse, but:
• Mass of concentrated around true
• Proof: infinite-dimensional golfingscheme (new)
2: Minimal norm certificate[Duval, Peyré 2015]
In progress…
19/21
Assumption: data are actually drawn from a GMM…
Results for separated GMM
Nicolas Keriven12/10/2017
1: Ideal scaling in sparsity
In progress…
• not necessarily sparse, but:
• Mass of concentrated around true
• Proof: infinite-dimensional golfingscheme (new)
2: Minimal norm certificate[Duval, Peyré 2015]
In progress…
• when n high enough: sparse, withright number of components
•
• Proof: adaptation of [Tang, Recht 2013](constructive!)
19/21
Assumption: data are actually drawn from a GMM…
Outline
Nicolas Keriven
Information-preservation guarantees: a RIP analysis
Total variation regularization:a dual certificate analysis
Conclusion, outlooks
Sketch learning
Nicolas Keriven
• Sketching :• Streaming, distributed learning
• Original view on data compression and generalized moments
• Combines random features and kernel mean with infinitedimensional Compressive sensing
20/21
Summary, outlooks
Nicolas Keriven
• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and
recovery metrics• Necessary and sufficient conditions• But: Non-convex minimization
21/21
Summary, outlooks
Nicolas Keriven
• Dual certificate analysis• Convex minimization• Does not handle modelling error• In some cases, automatically guess the right number of components
• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and
recovery metrics• Necessary and sufficient conditions• But: Non-convex minimization
21/21
Summary, outlooks
Nicolas Keriven
• Dual certificate analysis• Convex minimization• Does not handle modelling error• In some cases, automatically guess the right number of components
• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and
recovery metrics• Necessary and sufficient conditions• But: Non-convex minimization
21/21
• Outlooks• Algorithms for TV minimization• Other features (not necessarily random…)• Other « sketched » learning tasks• Multilayer sketches ?
Thank you !
Nicolas Keriven
• Keriven, Bourrier, Gribonval, Pérez. Sketching for Large-Scale Learning of Mixture Models Information & Inference: a Journal of the IMA, 2017. <arXiv:1606.02838>
• Keriven, Tremblay, Traonmilin, Gribonval. Compressive k-means ICASSP, 2017.
• Gribonval, Blanchard, Keriven, Traonmilin. Compressive Statistical Learning with Random Feature Moments. Preprint 2017. <arXiv:1706.07180>
• Keriven. Sketching for Large-Scale Learning of Mixture Models. PhD Thesis. <tel-01620815>
• Poon, Keriven, Peyré. A Dual Certificates Analysis of Compressive Off-the-Grid Recovery. Submitted
• Code: sketchml.gforge.inria.fr,github: nkeriven