+ All Categories
Home > Documents > Uniform Approximation of Functions with Random...

Uniform Approximation of Functions with Random...

Date post: 13-Oct-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
19
Uniform Approximation of Functions with Random Bases Ali Rahimi Intel Research Ben Recht UW Madison
Transcript
Page 1: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Uniform Approximation of Functions with Random Bases

Ali Rahimi Intel Research

Ben Recht UW Madison

Page 2: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

•  Goal: Find a class F which is easy to search over, but can approximate complex behavior.

dictated by application Which space of functions?

classes covariate

state

Typically a list of

example inputs

Page 3: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Approximation Schemes

•  Approximate by

•  Jones (1992),

•  Barron (1993),

•  Girosi & Anzellotti (1995),

•  Using nearly identical analysis, all of these schemes achieve

Page 4: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Approximation Schemes

•  Approximate by

•  Parameter tuning is tricky…

•  (Can achieve via a greedy “algorithm”).

Simultaneously optimize

Page 5: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Randomize, don’t optimize

•  Approximate by

•  For which functions can we achieve ?

•  How are these functions related to objects we already know and love?

•  Practical Implementations

optimize sample

Page 6: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Function Class

•  Fix parameterized basis functions

•  Fix a probability distribution

•  Our target space will be:

•  With the convention that

Page 7: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Random Features: Example

•  Fourier basis functions:

•  Gaussian parameters

•  If , then means

that the frequency distribution of f has subgaussian tails.

Page 8: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

•  Thm: Let f be in with . Let θ1,…, θn be sampled iid from p. Then with probability at least 1 - δ:

•  If additionally, φ(x;θ)=φ(θ'x), with φ:R→R L-Lipschitz, φ(0)=0, and |φ|<1 and p has a finite second moment, then with probability at least 1- δ

where

Page 9: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Reproducing Kernel Hilbert Spaces

•  A symmetric function k:X£X ! R is a positive definite kernel if for all N

•  Reproducing Kernel Hilbert Space:

•  Extensive Applications: Support Vector Machines, Kernel Machines, etc.

Page 10: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

•  RKHS generated by k:

•  Fp is dense in H, and for any f 2 Fp

Page 11: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Gaussian RKHS vs Random Features

•  Representer Theorem: for many applications, the optimal function in an RKHS is of the form

•  RKHS form is preferred: when number of data points is small or the function is not smooth

•  Random Features are preferred: when number of data points is very large or the Representer theorem doesn’t apply

given data set

Page 12: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Fourier Random Features

RKHS is dense in continuous functions

Page 13: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Random Decision Stumps

Boosting Features

Page 14: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Binning Random Features

Lay a random grid so that for any x and y Pr[x and y are binned together] = k(x,y)

φ(x) is the bin ID, encoded as a binary indicator vector.

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 …

δ -δ

1

Page 15: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

% Approximates Gaussian Process regression

% with Gaussian kernel of variance gamma

% lambda: regularization parameter

% dataset: X is dxN, y is 1xN

% test: xtest is dx1

% D: dimensionality of random feature

% training

w = randn(D, size(X,1));

b = 2*pi*rand(D,1);

Z = cos(sqrt(gamma)*w*X + repmat(b,1,size(X,2)));

alpha = (lambda*eye(size(X,2)+Z*Z')\(Z*y);

% testing

ztest = alpha(:)’*cos( sqrt(gamma)*w*xtest(:) + …

+ repmat(b,1,size(X,2)) );

Page 16: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,
Page 17: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,
Page 18: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,
Page 19: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Randomize

Optimize


Recommended